[PATCH] D148868: [llvm-profdata] ProfileReader cleanup - preparation for MD5 refactoring

Wenlei He via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Apr 26 16:02:19 PDT 2023


wenlei added a comment.

In D148868#4300107 <https://reviews.llvm.org/D148868#4300107>, @huangjd wrote:

> In D148868#4297862 <https://reviews.llvm.org/D148868#4297862>, @wenlei wrote:
>
>> Hoisting all the MD5 stuff out of compact binary format, and merge them with extended binary format makes sense. Initially compact binary format was the only one using MD5, but later on Wei added MD5 support in extended binary as well, which caused some of the duplication this patch is trying to address.
>>
>> However, from design POV, it's a good practice to keep the specific implementation down to leaf type as much as possible. Here MD5 only comes into play for extended binary and compact binary, so exposing that to all binary format is less than ideal. I think the problem is that compact binary and extended binary started out to be very different, but involved into something similar, so the type structure no longer reflects that commonality. Now in order to deduplicate code, you have to expose MD5 stuff to their lowest common ancestor, which is binary format.
>>
>> Actually in https://reviews.llvm.org/D76255, we talked about eventually remove compact binary. Can we simply remove compact binary now? Then we can keep MD5 stuff in extended binary still, and there will be no duplicates.
>>
>> cc @hoy @wlei
>
> I would like to limit the scope of the refactoring not to include another major change, since other reviewers would like to go through small patches. The purpose of my series of refactoring is to change the data representation of profile map, using MD5 as the key, which could bring significant speedup to profile load time. Reducing overloaded functions with similar logic make the code less error prone when changing the data representation.

Sure, limit changes to NFC only is good. But the problem is you are exposing MD5 stuff higher up in the type hierarchy to types that shouldn't need to be aware of MD5 details - this created a weird structure. The real problem is the out of sync type structure between compact binary and extended binary. I think that removing compact binary will help this refactoring and also avoid weird structure.

In terms of change size, you can split up a change just to remove compact binary support, then go back to the refactoring. If the goal is to clean things up, it's better to remove obsolete stuff first.

> TL;DR Eventually, based on external settings (whether we are using ProfileData in compiler or tools), we want the ability to choose whether to store names as MD5, regardless if the actual name table from the file is using Strings, MD5, or fixed length MD5, so the logic should be unified.

Yeah, and compact binary doesn't have that flexibility. Compact binary become deprecated/obsolete the moment we introduce MD5 to extended binary. We wanted wait for some time before removing it. Now 3 years went by and it's time, and it sort of is getting in the way of this refactoring.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D148868/new/

https://reviews.llvm.org/D148868



More information about the llvm-commits mailing list