[PATCH] D93838: [SCCP] Add Function Specialization pass

Fri May 7 23:55:57 PDT 2021

ChuanqiXu added a comment.

In D93838#2741605 <https://reviews.llvm.org/D93838#2741605>, @SjoerdMeijer wrote:

> In D93838#2741479 <https://reviews.llvm.org/D93838#2741479>, @ChuanqiXu wrote:
>
>> The benefits of 505.mcf_r comes from the specialization for `spec_qsort`. Here is the signature of `spec_qsort`:
>>
>>   void
>>   spec_qsort(void *a, size_t n, size_t es, cmp_t *cmp)
>>
>> Here the `cmp_t*` is a function pointer. And there are lots of uses of cpp in `spec_qsort`. And `spec_qsort` is called in two places in `505.mcf_r`:
>>
>>   spec_qsort(arcs_pointer_sorted[thread], new_arcs_array[thread], sizeof(arc_p),
>>                   (int (*)(const void *, const void *))arc_compare);
>>   spec_qsort(perm + 1, basket_sizes[thread], sizeof(BASKET*),
>>               (int (*)(const void *, const void *))cost_compare);
>>
>> Both arc_compare and cost_compare are global functions. So here we can get the reason why function specialization benefits 505.mcf_r. It is converting the indirect call to direct call by function specialization and the direct call would be inlined further.
>>
>> It looks like this pattern is usual in our daily work codes. However, I wonder what if there is multiple call site for `spec_qsort` with multiple global functions. It looks like the code now can't handle this situation, which is more usual in projects. I didn't ask for the change of cost model. I think we can made it in the future. This is just a sharing.
>
> Many thanks for sharing. With my infrastructure/workflow problems (mostly) sorted to run and evaluate things, I have seen exactly the same, so can confirm this.
>
> My baseline is trunk, without this patch applied, in LTO mode. So that is using the new pass manager, and as this new pass wasn't added to its LTO pipeline, I didn't see this triggering. But with that fixed and this patch applied, I noticed to the 30% gain with 30% extra compile time. This was on an older and noisier AArch64 system, but the trend was clear and especially the increased compile-times very obvious and consistent . I will also run this on a newer system, but I am still setting this up.
>
> PS. About LTO, I didn't see this triggering in non-LTO mode on MCF. That's why I am only looking at LTO at the moment.
>
> Now that I have solid LTO numbers and compile-times, I am going to look at compile-times.

After I applied this patch with trunk, I got 10% performance increase with 505, which is consistency with my previous experiments. It looks like the hardware maybe a key factor in this case? I would try to look into this.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93838/new/

https://reviews.llvm.org/D93838