[all-commits] [llvm/llvm-project] 9684c8: [flang][runtime] Fixed performance regression in C...
Slava Zakharin via All-commits
all-commits at lists.llvm.org
Tue Aug 6 08:23:42 PDT 2024
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: 9684c87d1402ea9327c1abd7f56bafed8e751f51
https://github.com/llvm/llvm-project/commit/9684c87d1402ea9327c1abd7f56bafed8e751f51
Author: Slava Zakharin <szakharin at nvidia.com>
Date: 2024-08-06 (Tue, 06 Aug 2024)
Changed paths:
M flang/runtime/copy.cpp
Log Message:
-----------
[flang][runtime] Fixed performance regression in CopyElement. (#102081)
Polyhedron/capacita,protein and CPU2000/facerec,wupwise showed up to
60% regression on x86 after #101421. The memcpy loops of the toAt and
fromAt arrays that are run to create the initial work item end up
being encoded as 'rep mov', and they add noticeable overhead
comparing to the total amount of work. 'rep mov' is not the best
choise for small size memcpy (e.g. when the array rank is 1 or 2,
it would be quite slow). Moreover, the rest of the stack related
setup is also noticeable for the simple cases.
I added a shortcut for the simple copy case, and also got rid
of the initial toAt/fromAt copies by allowing the CopyDescriptor
to use the external subscript storages.
To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications
More information about the All-commits
mailing list