<div dir="ltr"><div dir="ltr">> > Old: R RX RW(RELRO) RW<br>> > New: R(R+RELRO) RX RW;      R includes the traditional R part and the<br>> > RELRO part<br>> > Runtime (before relocation resolving): RW RX RW<br>> > Runtime (after relocation resolving): R RX RW<br>> ><br>> I actually see two ways of implementing this, and yes what you mentioned<br>> here is one of them:<br>>   1. Move RELRO to before RX, and merge it with R segment. This is what you<br>> said above.<br>>   2. Move RELRO to before RX, but keep it as a separate segment. This is<br>> what I implemented in my test.<br>> As I mentioned in my reply to Peter, option 1 would allow existing<br>> implementations to take advantage of this without any change. While I think<br>> this optimization is well worth it, if we go with option 1, the dynamic<br>> linkers won't have a choice to keep RO separate if they want to for<br>> whatever reason (e.g. less VM commit, finer granularity in VM maps, not<br>> wanting to have RO as writable even if for a short while.) So there's a<br>> trade-off to be made here (or an option to be added, even though we all<br>> want to avoid that if we can.)<br><br>Then you probably meant:<br><br>Old: R RX RW(RELRO) RW<br>New: R | RW(RELRO) RX RW<br>Runtime (before relocation resolving): R RW RX RW<br>Runtime (after relocation resolving): R R RX RW   ; the two R cannot be merged<br><br>| means a maxpagesize alignment. I am not sure whether you are going to add it<br>because I still do not understand where the saving comes from.<br><br>If the alignment is added, the R and RW maps can get contiguous<br>(non-overlapping) p_offset ranges. However, the RW map is private dirty,<br>it cannot be merged with adjacent maps so I am not clear how it can save kernel memory.</div><div dir="ltr"><br></div><div dir="ltr">If the alignment is not added, the two maps will get overlapping p_offset ranges.<br><br>> My test showed an overall ~1MB decrease in kernel slab memory usage on<br>> vm_area_struct, with about 150 processes running. For this to work, I had<br>> to modify the dynamic linker:<br><br>Can you elaborate how this decreases the kernel slab memory usage on<br>vm_area_struct?  References to source code are very welcomed :) This is<br>contrary to my intuition because the second R is private dirty.  The number of<br>VMAs do not decrease.<br><br>>   1. The dynamic linker needs to make the read-only VMA briefly writable in<br>> order for it to have the same VM flags with the RELRO VMA so that they can<br>> be merged. Specifically VM_ACCOUNT is set when a VMA is made writable.<br><br>Same question. I hope you can give a bit more details.<br><br>> > How to layout the segments if --no-rosegment is specified?<br>> > Runtime (before relocation resolving): RX RW   ;      some people may be<br>> > concered with writable stuff (relocated part) being made executable<br>> Indeed I think weakening in the security aspect may be a problem if we are<br>> to merge RELRO into RX. Keeping the old layout would be more<br>> preferable IMHO.<br><br>This means the new layout conflicts with --no-rosegment.<br>In Driver.cpp, there should be a "... cannot be used together" error.<br><br>> > Another problem is that in the default -z relro -z lazy (-z now not<br>> > specified) layout, .got and .got.plt will be separated by potentially huge<br>> > code sections (e.g. .text). I'm still thinking what problems this layout<br>> > change may bring.<br>> ><br>> Not sure if this is the same issue as what you mentioned here, but I also<br>> see a comment in lld/ELF/Writer.cpp about how .rodata and .eh_frame should<br>> be as close to .text as possible due to fear of relocation overflow. If we<br>> go with option 2 above, the distance would have to be made larger. With<br>> option 1, we may still have some leeway in how to order sections within the<br>> merged RELRO segment.<br><br>For huge executables (>2G or 3G), it may cause relocation overflows<br>between .text and .rodata if other large sections like .dynsym and .dynstr are<br>placed in between.<br><br>I do not worry too much about overflows potentially caused by moving<br>PT_GNU_RELRO around.  PT_GNU_RELRO is usually less than 10% of the size of the<br>RX PT_LOAD.<br><br>> This would be a somewhat tedious change (especially the part about having<br>> to update all the unit tests), but the benefit is pretty good, especially<br>> considering the kernel slab memory is not swappable/evictable. Please let<br>> me know your thoughts!<br><br>Definitely! I have prototyped this and find ~260 tests will need address changing..</div></div>