[LLVMdev] Adding masked vector load and store intrinsics
Dibyendu.Das at amd.com
Fri Oct 24 11:44:02 PDT 2014
Is there an example of such a workload ( lets say from the spec cpu 2006 harness or similar ) that you have in mind and the amount of gain expected ?
From: dag at cray.com [mailto:dag at cray.com]
Sent: Friday, October 24, 2014 10:52 PM
To: Das, Dibyendu
Cc: 'elena.demikhovsky at intel.com'; 'llvmdev at cs.uiuc.edu'
Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics
"Das, Dibyendu" <Dibyendu.Das at amd.com> writes:
> This looks to be a reasonable proposal. However native instructions
> that support such masked ld/st may have a high latency ? Also, it
> would be good to state some workloads where this will have a positive
Any significant vector workload will see a giant gain from this.
The masked operations really shouldn't have any more latency. The time of the memory operation itself dominates.
More information about the llvm-dev