[LLVMdev] [DragonEgg] [Polly] Should we expect DragonEgg to produce identical LLVM IR for identical GIMPLE?
Dmitry Mikushin
dmitry at kernelgen.org
Sat Feb 9 14:46:53 PST 2013
Hi,
So, from LLVM IR point of view, Polly is confused by loads of loops
dimensions right from the corresponding loops headers:
"162": ; preds = %"168",
%"162.preheader"
%391 = phi i32 [ %437, %"168" ], [ 1, %"162.preheader" ]
%392 = load i32* %ny, align 4
"163": ; preds = %"166",
%"163.preheader"
%400 = phi i32 [ %435, %"166" ], [ 1, %"163.preheader" ]
%401 = load i32* %nx, align 4
In aliasing_f.f90 (2D loop case) problematic nx load could be taken out of
the loop PHI-node block using the following sequence of LLVM optimizations:
{
PassManager manager;
manager.add(new TargetData(m));
manager.add(createBasicAliasAnalysisPass());
manager.add(createInstructionCombiningPass());
manager.add(createCFGSimplificationPass());
manager.add(createScalarReplAggregatesPass());
manager.add(createSimplifyLibCallsPass());
manager.add(createInstructionCombiningPass());
manager.add(createCFGSimplificationPass());
manager.add(createLoopRotatePass());
manager.add(createLICMPass());
manager.run(*m);
}
Unfortunately, this helps only for 2D loops. Even LLVM's -O3 is unable to
pull loads out of 3D loop (the case of original demo_f.f90 test). The only
solutions seems to be
dragonegg -O1 -fplugin-arg-dragonegg-enable-gcc-optzns demo_f.f90
I.e. only adding some of GCC's own optimization helps.
- D.
2013/1/2 Dmitry Mikushin <dmitry at kernelgen.org>
> Hi Duncan & Tobi,
>
> Thanks a lot for your interest, and for pointing out differences in GIMPLE
> I missed.
>
> Attached is simplified test case. Is it good?
>
> Tobi, regarding runtime alias analysis: in KernelGen we already do it
> along with runtime values substitution. For example:
>
> <------------------ __kernelgen_main_loop_17: compile started
> --------------------->
> Integer args substituted:
> offset = 32, ptrValue = 47248855040
> offset = 40, ptrValue = 47246749696
> offset = 48, ptrValue = 47247802368
> offset = 16, value = 64
> offset = 20, value = 64
> offset = 24, value = 64
> MemoryAccess to pointer: float* inttoptr (i64 47246749696 to float*)
> { Stmt__12_cloned_[i0, i1, i2] -> MemRef_nttoptr (i64 47246749696 to
> float*)[4096i0 + 64i1 + i2] }
> allocSize: 4 storeSize: 4
> replacedBy: { Stmt__12_cloned_[i0, i1, i2] -> NULL[o0] : o0 >=
> 47246749696 + 16384i0 + 256i1 + 4i2 and o0 <= 47246749699 + 16384i0 + 256i1
> + 4i2 }
> MemoryAccess to pointer: float* inttoptr (i64 47247802368 to float*)
> { Stmt__12_cloned_[i0, i1, i2] -> MemRef_nttoptr (i64 47247802368 to
> float*)[4096i0 + 64i1 + i2] }
> allocSize: 4 storeSize: 4
> replacedBy: { Stmt__12_cloned_[i0, i1, i2] -> NULL[o0] : o0 >=
> 47247802368 + 16384i0 + 256i1 + 4i2 and o0 <= 47247802371 + 16384i0 + 256i1
> + 4i2 }
> MemoryAccess to pointer: float* inttoptr (i64 47248855040 to float*)
> { Stmt__12_cloned_[i0, i1, i2] -> MemRef_nttoptr (i64 47248855040 to
> float*)[4096i0 + 64i1 + i2] }
> allocSize: 4 storeSize: 4
> replacedBy: { Stmt__12_cloned_[i0, i1, i2] -> NULL[o0] : o0 >=
> 47248855040 + 16384i0 + 256i1 + 4i2 and o0 <= 47248855043 + 16384i0 + 256i1
> + 4i2 }
>
> Number of good nested parallel loops: 3
> Average size of loops: 64 64 64
>
> <------------------------------ Scop: end
> ----------------------------------->
>
> <------------------------------ Scop: start
> --------------------------------->
> <------------------- Cloog AST of Scop ------------------->
> for (c2=0;c2<=63;c2++) {
> for (c4=0;c4<=63;c4++) {
> for (c6=0;c6<=63;c6++) {
> Stmt__12_cloned_(c2,c4,c6);
> }
> }
> }
> <--------------------------------------------------------->
> Context:
> { : }
> Statements {
> Stmt__12_cloned_
> Domain :=
> { Stmt__12_cloned_[i0, i1, i2] : i0 >= 0 and i0 <= 63 and
> i1 >= 0 and i1 <= 63 and i2 >= 0 and i2 <= 63 };
> Scattering :=
> { Stmt__12_cloned_[i0, i1, i2] -> scattering[0, i0, 0, i1,
> 0, i2, 0] };
> ReadAccess :=
> { Stmt__12_cloned_[i0, i1, i2] -> NULL[o0] : o0 >=
> 47246749696 + 16384i0 + 256i1 + 4i2 and o0 <= 47246749699 + 16384i0 + 256i1
> + 4i2 };
> ReadAccess :=
> { Stmt__12_cloned_[i0, i1, i2] -> NULL[o0] : o0 >=
> 47247802368 + 16384i0 + 256i1 + 4i2 and o0 <= 47247802371 + 16384i0 + 256i1
> + 4i2 };
> WriteAccess :=
> { Stmt__12_cloned_[i0, i1, i2] -> NULL[o0] : o0 >=
> 47248855040 + 16384i0 + 256i1 + 4i2 and o0 <= 47248855043 + 16384i0 + 256i1
> + 4i2 };
> }
> <------------------------------ Scop: end
> ----------------------------------->
> <------------------------------ Scop: dependences
> --------------------------->
> Write after read dependences:
> { }
> Read after write dependences:
> { }
> Write after write dependences:
> { }
> loop is parallel
> loop is parallel
> loop is parallel
> <------------------------------ Scop: dependences end
> ----------------------->
> 1 polly-detect - Number of regions that a valid part of Scop
> <------------------ __kernelgen_main_loop_17: compile completed
> ------------------->
>
> It works pretty well in many situations, but in this particular case it
> does not help. Those problematic "Fortran scalar values referred by
> pointers" (FSVRPs) can only substituted (replaced by actual value) after
> proper memory analysis. According to current design, memory analysis
> operates on SCoPs, but Polly is already unable to detect SCoP for the whole
> group of nested loops due to presence of those FSVRPs. So, chicken and egg
> problem.
>
> - D.
>
>
> 2013/1/2 Tobias Grosser <tobias at grosser.es>
>
>> On 01/01/2013 02:45 PM, Duncan Sands wrote:
>>
>>> Hi Dmitry,
>>>
>>>
>>>> In our compiler we use a modified version LLVM Polly, which is very
>>>> sensitive to
>>>> proper code generation. Among the number of limitations, the loop region
>>>> (enclosed by phi node on induction variable and branch) is required to
>>>> be free
>>>> of additional memory-dependent branches. In other words, there must be
>>>> no
>>>> conditional "br" instructions below phi nodes. The problem we are
>>>> facing is that
>>>> from *identical* GIMPLE for 3d loop used in different contexts
>>>> DragonEgg may
>>>> generate LLVM IR either conforming the described limitation, or
>>>> violating it.
>>>>
>>>
>>> the gimple isn't the same at all (see below). The differences are
>>> directly
>>> reflected in the unoptimized LLVM IR, turning up as additional memory
>>> loads
>>> in the "bad" version. In addition, the Fortran isn't really the same
>>> either:
>>> Fortran semantics allows the compiler to assume that the parameters of
>>> your
>>> new function "compute" (which are all passed in by reference, i.e. as
>>> pointers)
>>> do not alias each other or anything else in sight (i.e. they get the
>>> "restrict"
>>> qualifier in the gimple, noalias in the LLVM IR). Thus by factorizing
>>> the loop
>>> into "compute" you are actually giving the compiler more information.
>>>
>>> Summary:
>>> (1) as far as I can see the unoptimized LLVM IR is a direct
>>> reflection of
>>> the gimple: the differences for the loop part come directly from
>>> differences
>>> in the gimple;
>>> (2) the optimizers do a better good when you have "compute" partly
>>> because you
>>> provided them with additional aliasing information; this better optimized
>>> version then gets inlined into MAIN__.
>>> (3) this leaves the question of whether in the bad version it is
>>> logically
>>> possible for the optimizers to deduce the same aliasing information as is
>>> handed to them for free in the good version. To work this out it would
>>> be
>>> nice to have a smaller testcase.
>>>
>>
>> I would also be interested in a minimal test case. If e.g. only the alias
>> check is missing, we could introduce run-time alias checks such that Polly
>> would be able to optimize both versions. It is probably not as simple, but
>> a reduced test case would make it easier to figure out the exact problems.
>>
>> Thanks
>> Tobi
>>
>>
>> ______________________________**_________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/**mailman/listinfo/llvmdev<http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130209/36abf4d9/attachment.html>
More information about the llvm-dev
mailing list