[PATCH] D106447: [DAGCombine] DAGTypeLegalizer::GenWidenVectorLoads(): make use of dereferenceability knowledge

Fri Jul 23 02:15:31 PDT 2021

lebedev.ri added a comment.

Thanks for taking a look!

In D106447#2899434 <https://reviews.llvm.org/D106447#2899434>, @sebastian-ne wrote:

> Hi,
> we talked a bit about this internally and found some potential problems.

In D106447#2899434 <https://reviews.llvm.org/D106447#2899434>, @sebastian-ne wrote:

> There were some concerns that assuming that dereferenceable bytes >= alignment is incorrect. Memory protection may work at 4-byte or even byte granularity and objects may be over-aligned in memory. In these cases, loading padding bytes does not work.
>
> Even if it can be proven that enough bytes are dereferenceable,

This is FUD.

  /// isDereferenceable - Return true if V is always dereferenceable for
  /// Offset + Size byte.
  bool MachinePointerInfo::isDereferenceable(<...>

  /// Returns true if V is always dereferenceable for Size byte with alignment
  /// greater or equal than requested. If the context instruction is specified
  /// performs context-sensitive analysis and returns true if the pointer is
  /// dereferenceable at the specified instruction.
  bool isDereferenceableAndAlignedPointer(<...>

  /// Check if executing a load of this pointer value cannot trap.
  ///
  /// If DT and ScanFrom are specified this method performs context-sensitive
  /// analysis and returns true if it is safe to load immediately before ScanFrom.
  ///
  /// If it is not obviously safe to load from the specified pointer, we do
  /// a quick local scan of the basic block containing \c ScanFrom, to determine
  /// if the address is already accessed.
  ///
  /// This uses the pointee type to determine how many bytes need to be safe to
  /// load from the pointer.
  bool llvm::isSafeToLoadUnconditionally(Value *V, Align Alignment, APInt &Size,
                                         const DataLayout &DL,
                                         Instruction *ScanFrom,
                                         const DominatorTree *DT,
                                         const TargetLibraryInfo *TLI) {
    // If DT is not specified we can't make context-sensitive query
    const Instruction* CtxI = DT ? ScanFrom : nullptr;
    if (isDereferenceableAndAlignedPointer(V, Alignment, Size, DL, CtxI, DT, TLI))
      return true;

We can not second-guess LLVM semantics.
If we can tell that we are always allowed to load N bytes without trapping, then we can do that.
If we actually can't, then the problem is elsewhere, we shouldn't have deduced that we can.

> we do not want to widen loads for amdgpu (at least not in cases where the load gets a lot larger, i.e. loading 64 Bytes instead of 32). Widening a load poses more restrictions on the register allocator, as a larger consecutive set of registers needs to be allocated. And, in the case that a widened load hits more cache lines than before, it also consumes more memory bandwidth.

Ok, so we need a target hook, that's doable.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106447/new/

https://reviews.llvm.org/D106447