[LLVMbugs] [Bug 20006] New: uchar4 vector element-by-element LUT handled worse than 3.4.
bugzilla-daemon at llvm.org
bugzilla-daemon at llvm.org
Wed Jun 11 13:05:13 PDT 2014
http://llvm.org/bugs/show_bug.cgi?id=20006
Bug ID: 20006
Summary: uchar4 vector element-by-element LUT handled worse
than 3.4.
Product: clang
Version: trunk
Hardware: PC
OS: Linux
Status: NEW
Severity: normal
Priority: P
Component: -New Bugs
Assignee: unassignedclangbugs at nondot.org
Reporter: simon.hosie at arm.com
CC: llvmbugs at cs.uiuc.edu
Classification: Unclassified
Created attachment 12640
--> http://llvm.org/bugs/attachment.cgi?id=12640&action=edit
Function demonstrating regression.
Given a simple LUT loop which operates independently on each element of a
uchar4, trunk presses ahead with a bunch of inserts and extracts through a
temporary vector, along with a flurry of type conversion operations.
In contrast, Clang 3.4 appears to abandon the vector pretense at the outset and
takes the scalar values directly from the source pointer -- completing the work
in half the time.
GCC 4.8 appears to behave like Clang 3.4, but additionally packs the output
into a single 32-bit scalar register before writing.
This affects both amd64 and ARM. Simply -Ofast to reproduce on amd64, and
additionally -mfpu=neon for ARM to ensure that SIMD operations are available.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20140611/0b3e4c7e/attachment.html>
More information about the llvm-bugs
mailing list