[LLVMdev] Adding masked vector load and store intrinsics

Fri Oct 24 11:44:02 PDT 2014

Is there an example of such a workload ( lets say from the spec cpu 2006 harness or similar ) that you have in mind and the amount of gain expected ?
- dibyendu

-----Original Message-----
From: dag at cray.com [mailto:dag at cray.com] 
Sent: Friday, October 24, 2014 10:52 PM
To: Das, Dibyendu
Cc: 'elena.demikhovsky at intel.com'; 'llvmdev at cs.uiuc.edu'
Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics

"Das, Dibyendu" <Dibyendu.Das at amd.com> writes:

> This looks to be a reasonable proposal. However native instructions 
> that support such masked ld/st may have a high latency ? Also, it 
> would be good to state some workloads where this will have a positive 
> impact.

Any significant vector workload will see a giant gain from this.

The masked operations really shouldn't have any more latency.  The time of the memory operation itself dominates.

                            -David