[LLVMdev] Loop Vectorization and Store-Load Forwarding issue

Das, Dibyendu Dibyendu.Das at amd.com
Sat Jun 13 08:04:11 PDT 2015


Thx Gerolf. Let me investigate your suggestion.

From: Gerolf Hoflehner [mailto:ghoflehner at apple.com]
Sent: Saturday, June 13, 2015 3:37 AM
To: Das, Dibyendu; Adam Nemet
Cc: llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Loop Vectorization and Store-Load Forwarding issue

+Adam

I’m seeing more cases where the compiler makes guesses about the processor rather than querying a machine model. Rather than a sophisticated model there could be a basic/lightweight machine description file that can be queried when it is available. In this specific example a formula like 'dependence distance/ width > store2load_fwd_delay' would help conflict modeling. Does that sound like a promising path forward?

Cheers
Gerolf




On Jun 11, 2015, at 10:11 PM, Das, Dibyendu <Dibyendu.Das at amd.com<mailto:Dibyendu.Das at amd.com>> wrote:

I have been looking into this small test case (Part A) where loop vectorization is disabled due to possible store-load forwarding conflict (Part B). As you can see, due to the presence of dependence distance 2 the loop is vectorizable only for a width of 2. However, the presence of dependence distance 15 (due to y[j-15]) results in store-load forwarding issue as store packet of y[16:17] (iteration j=16) partially overlaps with load packets of y[15:16] (iteration j=30) and  y[17:18] (iteration j=32). As conflicts introduce additional delays in the store->load forwarding pipes, this fact is modeled in the method MemoryDepChecker::couldPreventStoreLoadForward() in LoopAccessAnalysis.cpp. The function may turn off vectorization in the presence of such conflicts. Looking through the code gives me the feeling that it may be more conservative than desired. The reason being, if the dependence distance is high, the conflicting store may flush out of the store pipe by the time the load is issued. And vectorization may become beneficial.

I am seeing some performance improvements when I disable the method above. This is for x86. Hence I am seeking some advice on how to improve the following logic. Can we better model NumCyclesForStoreLoadThroughMemory ? This may be way too high ? Or there are other ways to circumvent the basic problem ?

-TIA
Dibyendu

Part A:
  const unsigned NumCyclesForStoreLoadThroughMemory = 8*TypeByteSize;  // 512 for the test case shown
  // Maximum vector factor.
  unsigned MaxVFWithoutSLForwardIssues = VectorizerParams::MaxVectorWidth * TypeByteSize;
  if(MaxSafeDepDistBytes < MaxVFWithoutSLForwardIssues)
    MaxVFWithoutSLForwardIssues = MaxSafeDepDistBytes;

  for (unsigned vf = 2*TypeByteSize; vf <= MaxVFWithoutSLForwardIssues; vf *= 2) {
    if (Distance % vf && Distance / vf < NumCyclesForStoreLoadThroughMemory) {
      MaxVFWithoutSLForwardIssues = (vf >>=1);
      break;
    }
  }

  if (MaxVFWithoutSLForwardIssues< 2*TypeByteSize) {
    DEBUG(dbgs() << "LAA: Distance " << Distance <<
          " that could cause a store-load forwarding conflict\n");
    return true;
  }
----------------------------
Part B:
typedef unsigned long long uint64;

void foo(const unsigned char *m, unsigned int block, uint64 y[80])
{
    const unsigned char *sblock;
    int i, j;

    for (i = 0; i < (int) block; i++) {
        sblock = m + (i << 7);

        for (j = 16; j < 80; j++) {
           y[j] = y[j - 2] + y[j - 15] ;
        }
    }
}
Part C:
<snip> from the debug dump during the LoopAccessAnalysis phase:

LAA: Checking memory dependencies
LAA: Src Scev: {(8 + %y),+,8}<%for.body3>Sink Scev: {(128 + %y),+,8}<nsw><%for.body3>(Induction step: 1)
LAA: Distance for   %3 = load i64, i64* %arrayidx6, align 8 to   store i64 %add, i64* %arrayidx8, align 8: 120
LAA: Distance 120 that could cause a store-load forwarding conflict




_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu>         http://llvm.cs.uiuc.edu<http://llvm.cs.uiuc.edu/>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150613/41b615ff/attachment.html>


More information about the llvm-dev mailing list