[LLVMdev] Contributing to Polly with GSOC 2011

Mon Mar 21 11:20:51 PDT 2011

Dear all,

I am Raghesh, a student pursuing M.Tech at Indian Institute of
Technology, Madras, India.

I would like to make contribution to the Polly project
(http://wiki.llvm.org/Polyhedral_optimization_framework) as part of
GSOC 2011. I have gained some experience working in OpenMP Code
generation for Polly. This is almost stable now and planning to test
with the polybench benchmarks.

Some of the ideas that can be implemented in Polly is listed below.
Please let me know  your comments on this.

1. Increasing the Stability and Coverage of Polly
-------------------------------------------------------------

Polly can show good speedup on several test cases. there are still
many programs that cannot be optimized by it. One reason is that Polly
does not yet support casts like (sext, zext, trunk). Those often
appear implicit in programs, as they often have 32 bit induction
variables, but require 64 bit indexes for array subscripts.

For example:

for (int i = 0; i < N; i++)
 A[i] =

If we translate this to LLVM-IR and keep i an i32 but use an i64 to
calculate the access to A[i] there will be a sext necessary. Polly
currently do not handle this.

2. Testing Real Programs.
---------------------------------
Testing with well known benchmarks like SPEC CPU2006. This will help
us to find out cases where Polly cannot optimize and improve the
coverage of Polly on existing Programs.

3. Profiling in polly
-----------------------
The idea is explained below with a few examples. Consider the following code.
 scanf ( ”%d” , &b ) ;

 for ( i = 0 ; i < N; i +=b) {
         body ;
 }

Polly may not detect this as a SCoP because the variable b is read as
an user input. So to detect this as a SCoP we instrument the IR with
the information provided by profiling.
Suppose using profiling we figure out that most of the time the value
of b is say 2. we can convert the
above code as follows.

  scanf ( ”%d” , &b ) ;
   if ( b == 2 ) {
       for ( i = 0 ; i < N; i += 2 ) {
               body ;
       }
  } else {
           f o r ( i = 0 ; i < N; i += b ) {
                   body ;
           }
  }

Now with the transformed code the for loop inside ’if’ will be
detected as a SCoP and can be parallelised.
Since value of N is 100 most of the time, the overall performance will
be improved.

Consider another scenario.
  f o r ( i = 0 ; i < N; i ++) {
           body ;
  }

Suppose using profiling we know that N is always very small. So there
wont be much gain from parallelising
it. So we have to tell polly that don’t detect this as a SCoP if N is
less than a specific value.

Some other immediate applications would be

* Automatially derive the best scheduling strategy supported by OpenMP.
* Adding simple profiling support, to understand how much time is
spent inside each scop.

Andreas Simbuerger has done some significant work on this and can be extended.

5. Porting Polly to Various architectures.
-------------------------------------------------

Currently Polly generates everything as 64 bit integer, which is
problamatic for embedded platforms.

6. Vectorization in Polly
------------------------------
Lot of work needed to be done in this area.

-- 
Raghesh
II MTECH
Room No: 0xFF
Mahanadhi Hostel
IIT Madras