[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info

Thu Jan 16 10:06:45 PST 2014

On Jan 16, 2014, at 10:02 AM, Diego Novillo <dnovillo at google.com> wrote:

> On Thu, Jan 16, 2014 at 9:26 AM, Nadav Rotem <nrotem at apple.com> wrote:
>> Hi Diego,
>> 
>> It looks like the problem is with the code in the vectorizer that tries to estimate the most profitable vectorization factor:
>> 
>>> LV: Found an estimated cost of 6 for VF 2 For instruction:   %3 = load
>>> i64* %state, align 8, !dbg !58, !tbaa !61
>> 
>> 
>> It looks like a cost model problem.  The vectorizer thinks that loading %3 (above) is non consecutive and would require scatter/gather.  Is that correct? I wonder that SCEV is reporting. Is there an index overflow problem that is preventing us from loading consecutive elements?
> 
> Yes, I forgot to mention that. The access is non-consecutive:
> 
>      for(i=0; i<reg->size; i++)
>         {
>             /* Flip the target bit of each basis state */
>             reg->node[i].state ^= ((MAX_UNSIGNED) 1 << target);
> }
> 
> The code is writing to the 'state' field. The data structures look like this:
> 
> typedef struct quantum_matrix_struct quantum_matrix;
> 
> struct quantum_reg_node_struct
> {
>  COMPLEX_FLOAT amplitude; /* alpha_j */
>  MAX_UNSIGNED state;      /* j */
> };
> 
> typedef struct quantum_reg_node_struct quantum_reg_node;
> 
> /* The quantum register */
> 
> struct quantum_reg_struct
> {
>  int width;    /* number of qubits in the qureg */
>  int size;     /* number of non-zero vectors */
>  int hashw;    /* width of the hash array */
>  quantum_reg_node *node;
>  int *hash;
> };
> 
> If you do the trick of writing to a separate array, then the loop can
> be vectorized.
> 

Do you mean that we should change the layout of the array?  Or is there another trick?

> 
> Diego.