new file mode 100644
@@ -0,0 +1,193 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
+
+========================================
+Debugging advice for driver development
+========================================
+
+This document serves as a general starting point and lookup for debugging device
+drivers.
+While this guide focuses on debugging that requires re-compiling the
+module/kernel, the `userspace-debugging-guide <userspace_debugging_guide.html>`__
+will guide you through tools like dynamic debug, ftrace and other tools useful
+for debugging issues and behavior.
+For general debugging advice, see `general-debugging-guide <index.html>`__.
+
+.. contents::
+ :depth: 3
+
+Available tools
+===============
+
+Printk & friends
+----------------
+
+These are derivatives of printf() with varying destinations and support for being dynamically turned on or off, or lack thereof.
+
+.. _printk:
+
+**Simple printk**
+~~~~~~~~~~~~~~~~~
+
+The classic, can be used to great effect for quick and dirty development
+of new modules or to extract arbitrary necessary data for troubleshooting.
+
+Prerequisite: :code:`CONFIG_PRINTK` (usually enabled by default)
+
+**Pros**:
+
+- No need to learn anything, simple to use
+- Easy to modify exactly to your needs (formatting of the data (See: `format specifiers <../core-api/printk-formats.html>`__), visibility in the log)
+- Can cause delays in the execution of the code (beneficial to confirm whether timing is a factor)
+
+**Cons**:
+
+- Requires rebuilding the kernel/module
+- Can cause delays in the execution of the code (which can cause issues to be not reproducible)
+
+`Full documentation <../core-api/printk-basics.html>`__
+
+.. _trace_printk:
+
+**Trace_printk**
+~~~~~~~~~~~~~~~~
+
+Prerequisite: :code:`CONFIG_DYNAMIC_FTRACE` & :code:`#include <linux/ftrace.h>`
+
+It is a tiny bit less comfortable to use than `printk`_, because you will have
+to read the messages from the trace file (See: `Reading the ftrace log
+<userspace_debugging_guide.html#read-the-ftrace-log>`_ instead of from the
+kernel log, but very useful when printk adds unwanted delays into the code
+execution, causing issues to be flaky or hidden.)
+
+If the processing of this still causes timing issues then you can try `trace_puts()`.
+
+`Full Documentation <../driver-api/basics.html#c.trace_printk>`__
+
+**dev_dbg**
+~~~~~~~~~~~
+
+Print statement, which can be target by `dynamic debug
+<userspace_debugging_guide.html#dynamic-debug>`__, that contains additional
+information about the device used within the context.
+
+**When is it appropriate to leave a debug print in the code?**
+
+Permanent debug statements have to be useful for a developer to troubleshoot
+driver misbehavior. Judging that is a bit more of an art than a science, but
+some guidelines are in the `Coding style guide
+<../process/coding-style.html#printing-kernel-messages>`__.
+
+**Custom printk**
+~~~~~~~~~~~~~~~~~
+
+Example:
+::
+
+ #define core_dbg(fmt, arg...) do { \
+ if (core_debug) \
+ printk(KERN_DEBUG pr_fmt("core: " fmt), ## arg); \
+ } while (0)
+
+**When should you do this?**
+
+It is better to just use a `pr_debug()`, which can later be turned on/off with
+dynamic debug. Additionally, a lot of drivers activate these prints via a
+variable like `core_debug` set by a module parameter. However, Module
+parameters `are not recommended anymore
+<https://lkml.org/lkml/2024/3/27/163>`_.
+
+Ftrace
+------
+
+**Creating custom Ftrace tracepoint**
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Here is a basic description of `how to implement new tracepoints <../trace/tracepoints.html#usage>`__.
+
+`Full event tracing documentation <../trace/events.html>`__
+
+`Full Ftrace documentation <../trace/ftrace.html>`__
+
+DebugFS
+-------
+
+Prerequisite: :code:`CONFIG_DEBUG_FS` & :code:`#include <linux/debugfs.h>`
+
+DebugFS differs from the other approaches of debugging, as it doesn't write messages to the kernel log nor add traces to the code. Instead it allows the developer to handle a set of files.
+With these files you can either store values of variables or make register/memory dumps or you can make these files writable and modify values/settings in the driver.
+Possible use-cases among others:
+
+- Store register values
+- Keep track of variables
+- Store errors
+- Store settings
+- Toggle a setting like debug on/off
+- Error injection
+
+This is especially useful, when the size of a data dump would be hard to digest as
+part of the general kernel log (for example when dumping raw bitstream data) or
+when you are not interested in all the values all the time, but with the
+possibility to inspect them.
+
+The general idea is:
+
+- Create a directory during probe (`struct dentry *parent = debugfs_create_dir("my_driver", NULL);`)
+- Create a file (`debugfs_create_u32("my_value", 444, parent, &my_variable);`)
+
+ - In this example the file is found in `/sys/kernel/debug/my_driver/my_value` (with read permissions for user/group/all)
+ - any update of `my_variable` will update the value in the file
+
+- Clean up the folder when removing the device (`debugfs_remove_recursive(parent);`)
+
+`Full documentation <../filesystems/debugfs.html>`__
+
+.. _error_checking:
+
+KASAN, UBSAN, lockdep and other error checkers
+----------------------------------------------
+
+KASAN (Kernel Address Sanitizer)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+KASAN is a dynamic memory error detector that helps to find use-after-free and
+out-of-bounds bugs. It uses compile-time instrumentation to check every memory
+access.
+
+`Full documentation <../dev-tools/kasan.html>`__
+
+UBSAN (Undefined Behavior Sanitizer)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+UBSAN relies on compiler instrumentation and runtime checks to detect undefined
+behavior. It is designed to find a variety of issues, including signed integer overflow,
+array index out of bounds, and more.
+
+`Full documentation <../dev-tools/ubsan.html>`__
+
+lockdep (Lock Dependency Validator)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+lockdep is a runtime lock dependency validator that detects potential deadlocks
+and other locking-related issues in the kernel.
+It tracks lock acquisitions and releases, building a dependency graph that is
+analyzed for potential deadlocks.
+lockdep is especially useful for validating the correctness of lock ordering in
+the kernel.
+
+device coredump
+---------------
+
+Prerequisite: :code:`#include <linux/devcoredump.h>`
+
+Provides infrastructure through which the driver can provide arbitrary data to
+userland. It is most often used in conjunction with udev or similar userland
+infrastructure to listen for the kernel uevents, which indicates the dump is
+ready. Udev then usually has rules to copy that file somewhere for long-term
+storage and analysis as by default the data for the dump is automatically
+cleaned up after 5 minutes.
+That data is then analyzed with driver-specific tools or GDB.
+
+You can find an example implementation at: :code:`drivers/media/platform/qcom/venus/core.c`
+
+**Copyright** |copy| 2024 : Collabora
new file mode 100644
@@ -0,0 +1,66 @@
+
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
+
+====================================================
+General debugging advice for Linux Kernel developers
+====================================================
+
+.. toctree::
+ :maxdepth: 1
+
+ driver_development_debugging_guide
+ userspace_debugging_guide
+
+.. only:: subproject and html
+
+ Indices
+ =======
+
+ * :ref:`genindex`
+
+General debugging advice
+========================
+
+Depending on the issue, a different set of tools is available to track down the
+problem or even to realize whether there is one in the first place.
+
+As a first step you have to figure out what kind of issue you want to debug.
+Depending on the answer, your methodology and choice of tools may vary.
+
+Do I need to debug with limited access?
+---------------------------------------
+
+Do you have limited access to the machine or are you unable to stop the running execution?
+
+In this case your debugging capability depends on built-in debugging support of
+provided distro kernel.
+The `userspace debugging guide <userspace_debugging_guide.html>`__ provides a
+brief overview over range of possible debugging tools in that situation. You
+can check the capability of your kernel, in most cases, by looking into config
+file within the /boot folder.
+
+Do I have root access to the system?
+------------------------------------
+
+Are you easily able to replace the module in question or to install a new kernel?
+
+In that case your range of available tools is a lot bigger, you can find the
+tools `here <driver_development_debugging_guide.html>`__.
+
+Is timing a factor?
+-------------------
+
+It is important to understand if the problem you want to debug manifests itself
+consistently (i.e. given a set of inputs you always get the same, incorrect
+output), or inconsistently. If it manifests itself inconsistently, some timing
+factor might be at play. If inserting delays into the code does change the
+behavior, then quite likely timing is a factor.
+
+When timing does alter the outcome of the code execution using a simple `printk
+<driver_development_debugging_guide.html#printk>`_ for debugging purposes won't
+work, a similar alternative is to use `trace_printk
+<driver_development_debugging_guide.html#trace-printk>`_, which logs the debug
+messages to the trace file instead of the kernel log.
+
+**Copyright** |copy| 2024 : Collabora
new file mode 100644
@@ -0,0 +1,269 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
+
+==========================
+Userspace debugging advice
+==========================
+
+A brief overview of common tools to debug the Linux Kernel from userspace.
+For debugging advice aimed at driver developer go `here <driver_development_debugging_guide.html>`__.
+For general debugging advice, see `general-debugging-guide <index.html>`__.
+
+.. contents::
+ :depth: 3
+
+Available tools
+===============
+
+Dynamic debug
+-------------
+
+Mechanism to filter what ends up in the kernel log by dis-/en-abling log
+messages.
+
+Prerequisite: `CONFIG_DYNAMIC_DEBUG`
+
+.. _valid_dyndbg_prints:
+
+Dynamic debug is only able to target:
+
+- `pr_debug()`
+- `dev_dbg()`
+- `print_hex_dump_debug()`
+- `print_hex_dump_bytes()`
+
+Therefore the usability of this tool is quite limited in the media subsystem,
+because, as of now, there is no uniform rule for adding debug prints to the codebase,
+resulting in a variety of ways these prints are implemented.
+
+Also, note that most debug statements are implemented as a variation of
+`dprintk`, which have to be activated via a parameter in respective module,
+dynamic debug is unable to do that step for you.
+
+Here is one example, that enables all available `pr_debug()`'s within the file:
+::
+
+ $ alias ddcmd='echo $* > /proc/dynamic_debug/control'
+ $ ddcmd '-p; file v4l2-h264.c +p'
+ $ grep =p /proc/dynamic_debug/control
+ drivers/media/v4l2-core/v4l2-h264.c:372 [v4l2_h264]print_ref_list_b =p "ref_pic_list_b%u (cur_poc %u%c) %s"
+ drivers/media/v4l2-core/v4l2-h264.c:333 [v4l2_h264]print_ref_list_p =p "ref_pic_list_p (cur_poc %u%c) %s\n"
+
+**When should you use this over** `Ftrace`_ **?**
+
+- When the code contains one of the :ref:`valid print statements <valid_dyndbg_prints_>`_ or when you have added multiple pr_debug() statements during development
+- When timing is not an issue, meaning if multiple `pr_debug()` statements in the code won't cause delays
+- When you care more about receiving specific log messages than tracing the pattern of how a function is called
+
+`Full documentation <../admin-guide/dynamic-debug-howto.html>`__
+
+Ftrace
+------
+
+Prerequisite: :code:`CONFIG_DYNAMIC_FTRACE`
+
+Trace whenever the a file is opened:
+::
+
+ $ cd /sys/kernel/tracing
+ $ echo function > /sys/kernel/tracing/current_tracer
+ $ echo do_filep_open > set_ftrace_filter
+ $ echo 1 > tracing_on
+ $ cat trace
+ find-4624 [005] ...1. 580781.888166: do_filp_open <-do_sys_openat2
+ find-4624 [005] ...1. 580781.888237: do_filp_open <-do_sys_openat2
+ find-4624 [005] ...1. 580781.888361: do_filp_open <-do_sys_openat2
+
+.. _event_tracing:
+
+Activate a ftrace event on top of that:
+::
+
+ $ echo 1 > events/kmem/kfree/enable
+ find-5351 [005] ...1. 678288.910143: do_filp_open <-do_sys_openat2
+ find-5351 [005] ..... 678288.910185: kfree: call_site=fscrypt_fname_free_buffer+0x28/0x48 ptr=0000000000000000
+ find-5351 [005] ...1. 678288.910218: do_filp_open <-do_sys_openat2
+ find-5351 [005] ..... 678288.910260: kfree: call_site=fscrypt_fname_free_buffer+0x28/0x48 ptr=0000000000000000
+ find-5351 [005] ...1. 678288.910293: do_filp_open <-do_sys_openat2
+ find-5351 [005] ..... 678288.910345: kfree: call_site=fscrypt_fname_free_buffer+0x28/0x48 ptr=0000000000000000
+ find-5351 [005] ..... 678288.910389: kfree: call_site=free_rb_tree_fname+0x54/0x88 ptr=000000007ba73e40
+ find-5351 [005] ..... 678288.910390: kfree: call_site=free_rb_tree_fname+0x54/0x88 ptr=000000009e4850bc
+ find-5351 [005] ..... 678288.910391: kfree: call_site=free_rb_tree_fname+0x54/0x88 ptr=000000004156f20f
+ find-5351 [005] ..... 678288.910393: kfree: call_site=free_rb_tree_fname+0x54/0x88 ptr=00000000c7207e20
+ find-5351 [005] ..... 678288.910394: kfree: call_site=free_rb_tree_fname+0x54/0x88 ptr=0000000037c31d76
+ find-5351 [005] ..... 678288.910395: kfree: call_site=free_rb_tree_fname+0x54/0x88 ptr=0000000001922677
+
+.. _read_ftrace_log:
+
+**Reading the ftrace log**
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The `trace` file can be read just like any other file (`cat`, `tail`, `head`,
+`vim`, etc.), the size of the file is limited by the `buffer_size_kb` (``echo
+1000 > buffer_size_kb``). The `trace_pipe` will behave similar to the `trace`
+file, but whenever you read from the file the content is consumed.
+
+**Kernelshark**
+~~~~~~~~~~~~~~~
+
+A GUI interface to visualize the traces as a graph and list view from the
+output of the `trace-cmd
+<https://git.kernel.org/pub/scm/utils/trace-cmd/trace-cmd.git/>`__ application.
+
+`Kernelshark documentation <https://kernelshark.org/Documentation.html>`_
+
+`Full Ftrace documentation <../trace/ftrace.html>`__
+
+Perf & alternatives
+-------------------
+
+The tools mentioned above provide ways to inspect kernel code, results, variable values, etc.
+Sometimes you have to find out first where to look and for those cases a box of performance tracking tools can help you to frame the issue.
+
+.. _performance:
+
+**Why should you do a performance analysis?**
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+A performance analysis is a good first step when among other reasons:
+
+- you cannot define the issue
+- you do not know where it occurs
+- the running system should not be interrupted or it is a remote system, where you cannot install a new module/kernel
+
+.. _linux-tools:
+
+**How to do a simple analysis with linux tools?**
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+For the start of a performance analysis, you can start with the usual tools like:
+
+- `top` / `htop` / `atop` (*get an overview of the system load, see spikes on specific processes*)
+- `mpstat -P ALL` (*look at the load distribution among CPUs*)
+- `iostat -x` (*observe input and output devices utilization and performance*)
+- `vmstat` (*overview of memory usage on the system*)
+- `pidstat` (*similar to* `vmstat` *but per process, to dial it down to the target*)
+- `strace -tp $PID` (*once you know the process, you can figure out how it communicates with the Kernel*)
+
+These should help to figure out restrict the areas to look at sufficiently.
+
+**Diving deeper with perf**
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The **perf** tool provides a series of metrics and events to further dial down on issues.
+
+Prerequisite: build or install perf on your system
+
+Gather statistics data for finding all files starting with `gcc` in `/usr`
+::
+
+ # perf stat -d find /usr -name 'gcc*' | wc -l
+
+ Performance counter stats for 'find /usr -name gcc*':
+
+ 1277.81 msec task-clock # 0.997 CPUs utilized
+ 9 context-switches # 7.043 /sec
+ 1 cpu-migrations # 0.783 /sec
+ 704 page-faults # 550.943 /sec
+ 766548897 cycles # 0.600 GHz (97.15%)
+ 798285467 instructions # 1.04 insn per cycle (97.15%)
+ 57582731 branches # 45.064 M/sec (2.85%)
+ 3842573 branch-misses # 6.67% of all branches (97.15%)
+ 281616097 L1-dcache-loads # 220.390 M/sec (97.15%)
+ 4220975 L1-dcache-load-misses # 1.50% of all L1-dcache accesses (97.15%)
+ <not supported> LLC-loads
+ <not supported> LLC-load-misses
+
+ 1.281746009 seconds time elapsed
+
+ 0.508796000 seconds user
+ 0.773209000 seconds sys
+
+
+ 52
+
+The availability of events and metrics depends on the system you are running.
+
+`Full documentation <https://perf.wiki.kernel.org/index.php/Main_Page>`__
+
+**Perfetto**
+~~~~~~~~~~~~
+
+A set of tools to measure and analyze how well applications and systems perform.
+You can use it to:
+
+* identify bottlenecks
+* optimize code
+* make software run faster and more efficiently.
+
+**What is the difference between perfetto and perf?**
+
+* perf is tool as part of and specialized for the Linux Kernel and has CLI user
+ interface.
+* perfetto cross-platform performance analysis stack, has extended
+ functionality into userspace and provides a WEB user interface.
+
+`Full documentation <https://perfetto.dev/docs/>`__
+
+.. _kernel_panic_analysis_tools:
+
+Kernel panic analysis tools
+---------------------------
+
+ To analyse the crash dump please use `Kdump` & `Kexec`.
+
+ `Full documentation <../admin-guide/kdump/kdump.html>`__
+
+ In order to find the corresponding line in the code you can use `faddr2line
+ <https://elixir.bootlin.com/linux/latest/source/scripts/faddr2line>`__, note
+ that you need to enable `CONFIG_DEBUG_INFO` for that to work.
+
+ An alternative to using `faddr2line` is the use of `objdump` (and it's
+ derivatives for the different platforms like `aarch64-linux-gnu-objdump`),
+ take this line as an example:
+
+ `[ +0.000240] rkvdec_device_run+0x50/0x138 [rockchip_vdec]`.
+
+ We can find the corresponding line of code by executing:
+ ::
+
+ aarch64-linux-gnu-objdump -dS drivers/staging/media/rkvdec/rockchip-vdec.ko | grep rkvdec_device_run\>: -A 40
+ 0000000000000ac8 <rkvdec_device_run>:
+ ac8: d503201f nop
+ acc: d503201f nop
+ {
+ ad0: d503233f paciasp
+ ad4: a9bd7bfd stp x29, x30, [sp, #-48]!
+ ad8: 910003fd mov x29, sp
+ adc: a90153f3 stp x19, x20, [sp, #16]
+ ae0: a9025bf5 stp x21, x22, [sp, #32]
+ const struct rkvdec_coded_fmt_desc *desc = ctx->coded_fmt_desc;
+ ae4: f9411814 ldr x20, [x0, #560]
+ struct rkvdec_dev *rkvdec = ctx->dev;
+ ae8: f9418015 ldr x21, [x0, #768]
+ if (WARN_ON(!desc))
+ aec: b4000654 cbz x20, bb4 <rkvdec_device_run+0xec>
+ ret = pm_runtime_resume_and_get(rkvdec->dev);
+ af0: f943d2b6 ldr x22, [x21, #1952]
+ ret = __pm_runtime_resume(dev, RPM_GET_PUT);
+ af4: aa0003f3 mov x19, x0
+ af8: 52800081 mov w1, #0x4 // #4
+ afc: aa1603e0 mov x0, x22
+ b00: 94000000 bl 0 <__pm_runtime_resume>
+ if (ret < 0) {
+ b04: 37f80340 tbnz w0, #31, b6c <rkvdec_device_run+0xa4>
+ dev_warn(rkvdec->dev, "Not good\n");
+ b08: f943d2a0 ldr x0, [x21, #1952]
+ b0c: 90000001 adrp x1, 0 <rkvdec_try_ctrl-0x8>
+ b10: 91000021 add x1, x1, #0x0
+ b14: 94000000 bl 0 <_dev_warn>
+ *bad = 1;
+ b18: d2800001 mov x1, #0x0 // #0
+ ...
+
+
+ To find the matching line we just have to add `0x50` (from
+ `rkvdec_device_run+0x50`) to `0xac8` (from `0000000000000ac8
+ <rkvdec_device_run>:`), which yields `0xb18` corresponding with `*bad = 1`.
+
+**Copyright** |copy| 2024 : Collabora
@@ -57,6 +57,7 @@ Various other manuals with useful information for all kernel developers.
Testing guide <dev-tools/testing-overview>
Hacking guide <kernel-hacking/index>
Tracing <trace/index>
+ Debugging <debugging/index>
Fault injection <fault-injection/index>
Livepatching <livepatch/index>
Rust <rust/index>
@@ -76,6 +77,7 @@ developers seeking information on the kernel's user-space APIs.
Build system <kbuild/index>
Reporting issues <admin-guide/reporting-issues.rst>
Userspace tools <tools/index>
+ Userspace debugging tools <debugging/userspace_debugging_guide.rst>
Userspace API <userspace-api/index>
See also: the `Linux man pages <https://www.kernel.org/doc/man-pages/>`_,