[Lldb-commits] [PATCH] D118794: [lldb] Remove non-address bits from read/write addresses in lldb

David Spickett via Phabricator via lldb-commits lldb-commits at lists.llvm.org
Thu Apr 7 06:45:32 PDT 2022


DavidSpickett added a comment.

> Do you have any explanation on this from AARM

Yes I do.

  linux arch/arm64/kernel/ptrace.c:
    /*
     * The PAC bits can differ across data and instruction pointers
     * depending on TCR_EL1.TBID*, which we may make use of in future, so
     * we expose separate masks.
     */
    unsigned long mask = ptrauth_user_pac_mask();
    struct user_pac_mask uregs = {
      .data_mask = mask,
      .insn_mask = mask,
    };

So currently we'll only ever see one value, in both masks. The control bit this refers to is:

  D13.2.131 TCR_EL1, Translation Control Register (EL1)
  
  TBID0, bit [51]
  
  0b0 TCR_EL1.TBI0 applies to Instruction and Data accesses.
  0b1 TCR_EL1.TBI0 applies to Data accesses only.

This is talked about earlier in the docs:

  Supported PAC field and relation to the use of address tagging
  
  When address tagging is used
  The PAC field is Xn[54:bottom_PAC_bit].
  
  When address tagging is not used
  The PAC field is Xn[63:56, 54:bottom_PAC_bit].

The upshot of that is that you could have top byte ignore and PAC for data, but only PAC for instruction addresses.

PAC itself is all or nothing, at the hardware level it's on or off. If you wanted to not use it for one of code or data
your runtime simply chooses not to sign any pointers. Like arm64e appears to do for data
(https://developer.apple.com/documentation/security/preparing_your_app_to_work_with_pointer_authentication).

The current masks that lldb shows, which have top byte ignore included already:

  (lldb) process status --verbose
  <...>
  Addressable code address mask: 0xff7f000000000000
  Addressable data address mask: 0xff7f000000000000

So the end result is the same for us. What could happen is a future extension that isn't top byte ignore could use
those bits instead of PAC, making the PAC specific mask 0x007f...

Though I don't know how Linux would reconsile enabling TBI for userspace then doing that. Maybe the amount of top byte
use is small enough it could be changed (especially top byte of code addresses). But chances are slim it seems to me.

So back to my ideas in the previous comment.

> Assume that they're the same, which does work for Linux, for now.

Would work fine for Linux for now and probably for a long time given that changing the TBI setting would be seen as an ABI issue.
And if someone decided to disable TBI completely and only use PAC, this still works because PAC extends into the top byte.

If they do decide to disable TBI for instructions then we're still fine given that the mask to extract the virtual address remains
the same. Yes the PAC mask has changed but the debugger is looking to remove *all* non-address bits.

E.g. If we disable TBI for instruction accesses the mask is 0xff7f000000000000 because PAC claims the top byte.
Then the mask for data accesses is 0x007f000000000000 but we add TBI to get 0xff7f000000000000. Same result in the end.

So we could just pick one of the methods and standardise on that for sitautions where you don't know for sure it'll be a code address.
This will have to be `FixDataAddress` due to Arm Thumb's mode bit 0. We don't want to be aligning all reads to 2 bytes.
(FWIW this matches what I've done so far, though that was unintentional)

Perhaps we add a third method to make that clear (name subject to change) `FixAnyAddress`. Then the Arm code can forward that to fixdata and AArch64
can pick either data or code. For situations where you're sure you can pick code or data e.g. code breakpoint on an address.

> Add a method that does both fixes, on the assumption that the virtual address size for code and data is the same so no harm done and all bits will be removed either way.

The Arm Thumb problem means this is not going to work. (not that those targets are likely to care about non-address bits but these Fix calls are made from generic code
so it does still matter)

> Extensively track whether addresses refer to code or data

Isn't realistic a lot of the time. Though there are some clear situations where FixCode or FixData makes more sense so we can do some of this, just not an lldb wide tracking
framework sort of thing.

So my suggestion for a solution would be to add a FixAnyAddress alongside FixCode and FixData, and use that whenever it could be either. Tricky things like Arm Thumb can
then choose what the most "safe" fix is.

Tell me if that logic makes sense.

> Which will mean we actually dont need two separate functions.

At the ABI plugin level we do simply due to Arm Thumb existing. Lower down yeah you could get away with reading just one of the PAC masks but it's not much of a saving.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D118794/new/

https://reviews.llvm.org/D118794



More information about the lldb-commits mailing list