[lldb-dev] lldb_private::RegisterContext vs lldb_private::RegisterInfoInterface

Fri Sep 22 04:08:33 PDT 2017

On Thu, Sep 21, 2017 at 8:40 PM, Greg Clayton <clayborg at gmail.com> wrote:
>
>> On Sep 21, 2017, at 5:15 AM, Ramana <ramana.venkat83 at gmail.com> wrote:
>>
>> Sorry, I could not respond yesterday as I was of out of office.
>>
>>> Interesting. There are two ways to accomplish this:
>>> 1 - Treat the CPU as one target and the GPU as another.
>>> 2 - Treat the CPU and GPU as one target
>>>
>>> The tricky things with solution #1 is how to manage switching the targets
>>> between the CPU and GPU when events happen (CPU stops, or GPU stops while
>>> the other is running or already stopped). We don't have any formal
>>> "cooperative targets" yet, but we know they will exist in the future
>>> (client/server, vm code/vm debug of vm code, etc) so we will be happy to
>>> assist with questions if and when you get there.
>>
>> I was going along the option #1. Would definitely post here with more
>> questions as I progress, thank you. Fortunately, the way OpenVX APIs
>> work is, after off-loading the tasks to GPU, they will wait for the
>> GPU to complete those tasks before continuing further. And in our
>> case, both CPU and GPU can be controlled separately. Given that, do
>> you think I still need to bother much about "cooperative targets"?
>
> If you just want to make two targets that know nothing about each other, then that is very easy. Is that what you were asking?

Probably I am not getting the significance of "cooperative targets"
for our setup at this point. Will get back after I dig a little more
deeper.

>
>>
>>> GPU debugging is tricky since they usually don't have a kernel or anything
>>> running on the hardware. Many examples I have seen so far will set a
>>> breakpoint in the program at some point by compiling the code with a
>>> breakpoint inserted, run to that breakpoint, and then if the user wants to
>>> continue, you recompile with breakpoints set at a later place and re-run the
>>> entire program again. Is your GPU any different?
>>
>>> We also discussed how to single step in a GPU program. Since multiple cores
>>> on the GPU are concurrently running the same program, there was discussion
>>> on how single stepping would work. If you are stepping and run into an
>>> if/then statement, do you walk through the if and the else at all times? One
>>> GPU professional was saying this is how GPU folks would want to see single
>>> stepping happen. So I think there is a lot of stuff we need to think about
>>> when debugging GPUs in general.
>>
>> Thanks for sharing that. Yeah, ours is a little different. Basically,
>> from the top level, the affinity in our case is per core of the GPU. I
>> am not there yet to discuss more on this.
>
> ok, let me know when you are ready to ask more questions.
>
>>
>>> So we currently have no cooperative targets in LLDB. This will be the first.
>>> We will need to discuss how hand off between the targets will occur and many
>>> other aspects. We will be sure to comment when and if you get to this point.
>>
>> Thank you. Will post more when I get there.
>
> Sounds good.
>>
>> Regards,
>> Ramana
>>
>> On Tue, Sep 19, 2017 at 8:56 PM, Greg Clayton <clayborg at gmail.com> wrote:
>>>
>>> On Sep 19, 2017, at 3:32 AM, Ramana <ramana.venkat83 at gmail.com> wrote:
>>>
>>> Thank you so much Greg for your comments.
>>>
>>> What architecture and os are you looking to support?
>>>
>>>
>>> The OS is Linux and the primary use scenario is remote debugging.
>>> Basically http://lists.llvm.org/pipermail/lldb-dev/2017-June/012445.html
>>> is what I am trying to achieve and unfortunately that query did not
>>> get much attention of the members.
>>>
>>>
>>> Sorry about missing that. I will attempt to address this now:
>>>
>>> I have to implement a debugger for our HW which comprises of CPU+GPU where
>>> the GPU is coded in OpenCL and is accelerated through OpenVX API in C++
>>> application which runs on CPU. Our requirement is we should be able to
>>> debug the code running on both CPU and GPU simultaneously with in the same
>>> LLDB debug session.
>>>
>>>
>>> Interesting. There are two ways to accomplish this:
>>> 1 - Treat the CPU as one target and the GPU as another.
>>> 2 - Treat the CPU and GPU as one target
>>>
>>> There are tricky areas for both, but for sanity I would suggest options #1.
>>>
>>> The tricky things with solution #1 is how to manage switching the targets
>>> between the CPU and GPU when events happen (CPU stops, or GPU stops while
>>> the other is running or already stopped). We don't have any formal
>>> "cooperative targets" yet, but we know they will exist in the future
>>> (client/server, vm code/vm debug of vm code, etc) so we will be happy to
>>> assist with questions if and when you get there.
>>>
>>> Option #2 would be tricky as this would be the first target that has
>>> multiple architectures within one process. IF the CPU and GPU be be
>>> controlled separately, then I would go with option #1 as LLDB currently
>>> always stops all threads in a process when any thread stops. You would also
>>> need to implement different register contexts for each thread within such a
>>> target. It hasn't been done yet, other than through the OS plug-ins that can
>>> provide extra threads to show in case you are doing some sort of user space
>>> threading.
>>>
>>> GPU debugging is tricky since they usually don't have a kernel or anything
>>> running on the hardware. Many examples I have seen so far will set a
>>> breakpoint in the program at some point by compiling the code with a
>>> breakpoint inserted, run to that breakpoint, and then if the user wants to
>>> continue, you recompile with breakpoints set at a later place and re-run the
>>> entire program again. Is your GPU any different? Since they will be used in
>>> an OpenCL context maybe your solution is better? We also had discussions on
>>> how to represent the various "waves" or sets of cores running the same
>>> program on the GPU. The easiest solution is to make one thread per distinct
>>> core on the GPU. The harder way would be to treat a thread as a collection
>>> of multiple cores and each variable value now can have one value per core.
>>>
>>> We also discussed how to single step in a GPU program. Since multiple cores
>>> on the GPU are concurrently running the same program, there was discussion
>>> on how single stepping would work. If you are stepping and run into an
>>> if/then statement, do you walk through the if and the else at all times? One
>>> GPU professional was saying this is how GPU folks would want to see single
>>> stepping happen. So I think there is a lot of stuff we need to think about
>>> when debugging GPUs in general.
>>>
>>> Looking at the mailing list archive I see that there were discussions about
>>> this feature in LLDB here
>>> http://lists.llvm.org/pipermail/lldb-dev/2014-August/005074.html.
>>>
>>> What is the present status i.e. what works today and what is to be improved
>>> of simultaneous multiple target debugging support in LLDB? Were the changes
>>> contributed to LLDB mainstream?
>>>
>>>
>>> So we currently have no cooperative targets in LLDB. This will be the first.
>>> We will need to discuss how hand off between the targets will occur and many
>>> other aspects. We will be sure to comment when and if you get to this point.
>>>
>>> How can I access the material for http://llvm.org/devmtg/2014-10/#bof5
>>> (Future directions and features for LLDB)
>>>
>>> Over the years we have talked about this, but it never really got into any
>>> real amount of detail and I don't think the BoF notes will help you much.
>>>
>>> Appreciate any help/guidance provided on the same.
>>>
>>> I do believe approach #1 will work the best. The easiest thing you can do is
>>> to insulate LLDB from the GPU by putting it behind a GDB server boundary.
>>> Then we need to really figure out how we want to do GPU debugging.
>>>
>>> Hopefully this filled in your missing answers. Let me know what questions
>>> you have.
>>>
>>> Greg
>>>
>>> Thanks,
>>> Ramana
>>>
>>> On Mon, Sep 18, 2017 at 8:46 PM, Greg Clayton <clayborg at gmail.com> wrote:
>>>
>>> When supporting a new architecture, our preferred route is to modify
>>> lldb-server (a GDB server binary that supports native debugging) to support
>>> your architecture. Why? Because this gets you remote debugging for free. If
>>> you go this route, then you will subclass a
>>> lldb_private::NativeRegisterContext and that will get used by lldb-server
>>> (along with lldb_private::NativeProcessProtocol and
>>> lldb_private::NativeThreadProtocol). If you are adding a new architecture to
>>> Linux, then you will likely just need to subclass NativeRegisterContext.
>>>
>>> The other way to go is to subclass lldb_private::Process,
>>> lldb_private::Thread and lldb_private::RegisterContext.
>>>
>>> The nice thing about the lldb_private::Native* subclasses is that you only
>>> need to worry about native support. You can use #ifdef and use system header
>>> files, where as the non native route, those classes need to be able to debug
>>> remotely and you can't rely on system headers (lldb_private::Process,
>>> lldb_private::Thread and lldb_private::RegisterContext) since they can be
>>> compiled on any system for possibly local debugging (if current
>>> arch/vendor/os matches the current system) and remote (if you use
>>> lldb-server or another form for RPC).
>>>
>>> I would highly suggest getting going the lldb-server route as then you can
>>> use system header files that contain the definitions of the registers and
>>> you only need to worry about the native architecture. Linux uses ptrace and
>>> has much the the common code filtered out into correct classes (posix
>>> ptrace, linux specifics, and more.
>>>
>>> What architecture and os are you looking to support?
>>>
>>> Greg Clayton
>>>
>>> On Sep 16, 2017, at 6:28 AM, Ramana <ramana.venkat83 at gmail.com> wrote:
>>>
>>> Thank you Greg for the detailed response.
>>>
>>> Can you please also shed some light on the NativeRegisterContext. When
>>> do we need to subclass NativeRegisterContext and (how) are they
>>> related to RegisterContext<OS>_<Arc
>>> It appears that not all architectures having
>>> RegisterContext<OS>_<Arch> have sub classed NativeRegisterContext.
>>>
>>> Regards,
>>> Ramana
>>>
>>> On Thu, Sep 14, 2017 at 9:02 PM, Greg Clayton <clayborg at gmail.com> wrote:
>>>
>>> Seems like this class was added for testing. RegisterInfoInterface is a
>>> class that creates a common API for getting lldb_private::RegisterInfo
>>> structures.
>>>
>>> A RegisterContext<OS>_<Arch> class uses one of these to be able to create a
>>> buffer large enough to store all registers defined in the
>>> RegisterInfoInterface and will actually read/write there registers to/from
>>> the debugged process. RegisterContext also caches registers values so they
>>> don't get read multiple times when the process hasn't resumed. A
>>> RegisterContext subclass is needed for each architecture so we can
>>> dynamically tell LLDB what the registers look like for a given architecture.
>>> It also provides abstractions by letting each register define its registers
>>> numbers for Compilers, DWARF, and generic register numbers like PC, SP, FP,
>>> return address, and flags registers. This allows the generic part of LLDB to
>>> say "I need you to give me the PC register for this thread" and we don't
>>> need to know that the register is "eip" on x86, "rip" on x86_64, "r15" on
>>> ARM. RegisterContext classes can also determine how registers are
>>> read/written: one at a time, or "get all general purpose regs" and "get all
>>> FPU regs". So if someone asks a RegisterContext to read the PC, it might go
>>> read all GPR regs and then mark them all as valid in the register context
>>> buffer cache, so if someone subsequently asks for SP, it will be already
>>> cached.
>>>
>>> So RegisterInfoInterface defines a common way that many RegisterContext
>>> classes can inherit from in order to give out the lldb_private::RegisterInfo
>>> (which is required by all subclasses of RegisterContext) info for a register
>>> context, and RegisterContext is the one that actually will interface with
>>> the debugged process in order to read/write and cache those registers as
>>> efficiently as possible for the current program being debugged.
>>>
>>> On Sep 12, 2017, at 10:59 PM, Ramana via lldb-dev <lldb-dev at lists.llvm.org>
>>> wrote:
>>>
>>> Hi,
>>>
>>> When deriving RegisterContext<OS>_<Arch>, why some platforms (Arch+OS)
>>> are deriving it from lldb_private::RegisterContext while others are
>>> deriving from lldb_private::RegisterInfoInterface or in other words
>>> how to decide on the base class to derive from between those two and
>>> what are the implications?
>>>
>>> Thanks,
>>> Ramana
>>> _______________________________________________
>>> lldb-dev mailing list
>>> lldb-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>>
>>>
>>>
>>>
>