[PATCH] D24934: [LICM] Add support of a new optimization case to Loop Versioning for LICM + code clean up

Evgeny Astigeevich via llvm-commits llvm-commits at lists.llvm.org
Mon Sep 26 11:56:27 PDT 2016


eastig created this revision.
eastig added reviewers: hfinkel, anemet, ashutosh.nema.
eastig added subscribers: llvm-commits, jmolloy.
Herald added subscribers: mzolotukhin, sanjoy.

At the moment Loop Versioning for LICM does not support the following loops which, if versioned, give ~+18-40%% score improvement of benchmarks on Cortex-M7:

```
void mem_copy_01(char **dst, char **src, int bytes_count) {
    while (bytes_count--)
    {
      *((*dst)++) = *((*src)++);
    }
}

void mem_copy_02(char **dst, char *src, int bytes_count) {
    while (bytes_count--)
    {
      *((*dst)++) = *src++;
    }
}

void mem_copy_03(char *dst, char **src, int bytes_count) {
    while (bytes_count--)
    {
      *dst++ = *((*src)++);
    }
}

```
IR of mem_copy_01:
```
define void @mem_copy_01(i8** nocapture %dst, i8** nocapture %src, i32 %bytes_count) {
entry:
  %tobool2 = icmp eq i32 %bytes_count, 0
  br i1 %tobool2, label %while.end, label %while.body

while.body:                                       ; preds = %entry, %while.body
  %bytes_count.addr.03 = phi i32 [ %dec, %while.body ], [ %bytes_count, %entry ]
  %dec = add nsw i32 %bytes_count.addr.03, -1
  %0 = load i8*, i8** %src, align 4, !tbaa !3
  %incdec.ptr = getelementptr inbounds i8, i8* %0, i32 1
  store i8* %incdec.ptr, i8** %src, align 4, !tbaa !3
  %1 = load i8, i8* %0, align 1, !tbaa !7
  %2 = load i8*, i8** %dst, align 4, !tbaa !3
  %incdec.ptr1 = getelementptr inbounds i8, i8* %2, i32 1
  store i8* %incdec.ptr1, i8** %dst, align 4, !tbaa !3
  store i8 %1, i8* %2, align 1, !tbaa !7
  %tobool = icmp eq i32 %dec, 0
  br i1 %tobool, label %while.end, label %while.body

while.end:                                        ; preds = %while.body, %entry
  ret void
}
```
LoopAccessAnalysis can create aliasing checks for src and dst but not for *src and *dst because *src and *dst are loaded from memory. If we look at IR above we can notice how the pointers are defined and used (InvPtr - loop invariant pointer):
```
Ptr = Load(InvPtr)
NewPtr = GEP(Ptr, Const)
Store(NewPtr, InvPtr)
Mem_operations using Ptr
```

# If Ptr and InvPtr are not aliased at the iteration N then at the iteration N+1 the value of Ptr is the value defined by the GEP instruction.
# Without aliasing Ptr has values from [Ptr0, Ptr0 + (number_of_iterations-1) * type_size * GEP_index], where Ptr0 is Load(InvPtr) at the first iteration.

Absence of aliasing means:

```
4_or_8_bytes_aligned(Ptr0) != InvPtr                                          : iteration 1
4_or_8_bytes_aligned(Ptr0 + type_size*GEP_index) != InvPtr   : iteration 2
4_or_8_bytes_aligned(Ptr0 + 2*type_size*GEP_index) != InvPtr: iteration 3
...
```
Aligned Ptr0 is used because InvPtr is a pointer to a pointer and it's aligned either 4 or 8 bytes.
We can write a stricter check:
InvPtr is not in [4_or_8_bytes_aligned(Ptr0), Ptr0 + (number_of_iterations-1) * type_size * GEP_index]
which guarantees all checks above are satisfied.

We check only aliasing among pointers loaded from invariant locations  and pointers to those locations which is enough to make decisions to move operations on invariant pointers out of a loop. As checks are for the purpose of LICM and don't cover all pointers combinations creation/adding of them can not be in LoopAccessAnalysis/LoopVersioning. LoopAccessAnalysis/LoopVersioning should provide a means of processing unrecognized pointers and adding checks for them.

Summary of changes:
# Clean up of the code of Loop Versioning for LICM. See comments to the changes below.
# LoopVersioning::versionLoop functions are changed to return BasicBlock where RT checks are inserted. The return basic block can be used for inserting additional checks.
# LoopAccessAnalysis can operate in 'PartialCheckingAllowed' state which mean to create RT checks for recognized pointers and collect unrecognized pointers. The unrecognized pointers can be processed by a user of LAA later.
# Recognition of the new optimization case is added to Loop Versioning for LICM.
# New tests are added.
# Old tests are updated.





https://reviews.llvm.org/D24934

Files:
  include/llvm/Analysis/LoopAccessAnalysis.h
  include/llvm/Transforms/Utils/LoopVersioning.h
  lib/Analysis/LoopAccessAnalysis.cpp
  lib/Transforms/Scalar/LoopVersioningLICM.cpp
  lib/Transforms/Utils/LoopVersioning.cpp
  test/Transforms/LoopVersioningLICM/copying-bytes-loop-01.ll
  test/Transforms/LoopVersioningLICM/copying-bytes-loop-02.ll
  test/Transforms/LoopVersioningLICM/loopversioningLICM1.ll
  test/Transforms/LoopVersioningLICM/loopversioningLICM2.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D24934.72477.patch
Type: text/x-patch
Size: 38034 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160926/e4ab1139/attachment.bin>


More information about the llvm-commits mailing list