[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info

Thu Jan 16 10:02:45 PST 2014

On Thu, Jan 16, 2014 at 9:26 AM, Nadav Rotem <nrotem at apple.com> wrote:
> Hi Diego,
>
> It looks like the problem is with the code in the vectorizer that tries to estimate the most profitable vectorization factor:
>
>> LV: Found an estimated cost of 6 for VF 2 For instruction:   %3 = load
>> i64* %state, align 8, !dbg !58, !tbaa !61
>
>
> It looks like a cost model problem.  The vectorizer thinks that loading %3 (above) is non consecutive and would require scatter/gather.  Is that correct? I wonder that SCEV is reporting. Is there an index overflow problem that is preventing us from loading consecutive elements?

Yes, I forgot to mention that. The access is non-consecutive:

      for(i=0; i<reg->size; i++)
         {
             /* Flip the target bit of each basis state */
             reg->node[i].state ^= ((MAX_UNSIGNED) 1 << target);
}

The code is writing to the 'state' field. The data structures look like this:

typedef struct quantum_matrix_struct quantum_matrix;

struct quantum_reg_node_struct
{
  COMPLEX_FLOAT amplitude; /* alpha_j */
  MAX_UNSIGNED state;      /* j */
};

typedef struct quantum_reg_node_struct quantum_reg_node;

/* The quantum register */

struct quantum_reg_struct
{
  int width;    /* number of qubits in the qureg */
  int size;     /* number of non-zero vectors */
  int hashw;    /* width of the hash array */
  quantum_reg_node *node;
  int *hash;
};

If you do the trick of writing to a separate array, then the loop can
be vectorized.

Diego.