Generating and maintaining thousands of LTTng tracepoints for an existing C API—and then unpacking them with Babeltrace2—can quickly become tedious and error‑prone. In this talk, we’ll introduce two complementary open source tools that automate the end‑to‑end process:
-
h2yaml: A Clang‑based Python utility that parses your C header files and emits a human‑readable YAML description used to generate futures tracepoint (names, arguments, types, etc.). Under the hood, it use libclang’s python binding undocumented idiosyncrasies.
-
Metababel: A Babeltrace2 plugin generator that consumes the YAML manifest produced by h2yaml and generates the corresponding unpacking code. Making writing Babeltrace plugins a breeze
We’ll demonstrate both tools in action—using CUDA, MPI, and Intel Level Zero as concrete examples—and show how this workflow reduces development time, eliminates copy‑paste errors, and keeps your trace instrumentation, and bt2 sync in sync with API changes.
Graphics Computing Units (GPUs) have evolved in the past decades to become a key element of many computing systems, due to their affordable raw computing power. However, they require intricate knowledge of their behavior to achieve the best performance their performance can offer. GPU developers usually rely on tools to analyze the performance of their code, but existing tools fall short as they are unable to provide detailed data from the device.
In this presentation, we explore tracing methods for GPUs compute kernels. We first discuss implementation methods for efficient tracing that alleviate tracing challenges. We then present possible tracing schemes for GPUs and study their performance on a GPU-accelerated computing benchmark. Lastly, we discuss the instrumentation challenges of SIMT code and how to best place tracepoints.
Present recent features, ongoing development, and upcoming releases roadmap of the LTTng project.
This presentation outlines an augmented lock‑free transaction protocol for LTTng-UST shared‑memory ring buffers that resolves long‑standing robustness problems when a producer stalls during a transaction (e.g., at breakpoints or when receiving a termination signal). By extending the transaction protocol with a sub-buffer ownership semantic and allowing the consumer daemon to safely abort or recover from incomplete transactions, the updated protocol prevents loss of events and eliminates situations where a destroyed session becomes stuck, waiting indefinitely for a stalled producer to finish its transaction, while preserving the high‑performance lock‑free nature of the original design. Extensive reproducible testing confirms the correctness of the protocol. The upcoming LTTng 2.15 release will integrate these improvements, delivering a more reliable and adaptable LTTng ecosystem.
A new method of creating user space stack tracing is being developed in the kernel that does not rely on frame pointers. This method is called SFrames. SFrames is a section in the ELF executable that holds two tables that can be used to look up the current instruction pointer in the first table (via a binary search) which takes you to the second table that tells the kernel how to find the return address for the current function. Then that address can be used to lookup the return address of that function creating a stack trace.
For the kernel to have access to these tables, which live in user space, it has to be able to handle a major page fault. That is because the table may still be on disk, and to read it, it has to be read into memory via the page fault handler. There's only two locations that generic code in the kernel can know if it's safe to handle a major page fault and that is when the task is entering or exiting the kernel.
As profilers usually trigger interrupts or NMIs to get the stack trace, including the kernel stack, it must defer the reading of the SFrames until the task goes back to user space. This talk will discuss the proposed implementation, the issues with handling deferred stack traces, and other various aspects that this new feature will provide.
Tracing FPGAs in heterogeneous systems remains a significant challenge. Vendor-provided hardware tracing tools often have limited trace durations, which restrict visibility into complex and long-running issues. When combined with the inherent complexity of heterogeneous platforms, this limitation makes debugging and performance analysis especially difficult. Tracing software and hardware separately further complicates matters, as aligning software events with FPGA logic becomes a tedious and error-prone task, leaving designers without a clear understanding of system-wide behaviors. This presentation introduces a solution that bridges this gap: a combined tracing approach that integrates FPGA and software event traces into a unified view. Our work comprises an FPGA core capable of tracing AXI transactions, signals, and state machines, along with a software component that collects and processes the trace data. The FPGA core timestamps events and transmits them over DMA to software, where they are stored with synchronized time markers for post-processing. The result is a cohesive, time-aligned trace that can be analyzed in conjunction with software traces, offering designers deeper insight into system interactions and making root cause analysis more accessible.
The Linux kernel has a mature robust tracing infrastructure. Several in fact. The focus today is not what to trace, but how to look at what has been traced.
There's several benchmarks that test the Linux kernel scheduler. Changing the scheduler algorithm can make one benchmark perform better while making another benchmark perform worse. These benchmarks only tell you how one scheduler compares to another scheduler but it does not give you any insight into why.
This session is a discussion on what analysis can be done from tracing the scheduler. This is more than just taking statistics, but looking at the flow of events as well. What tooling can be added where it measures the work flow of tasks and how the scheduler affects it. For instance, when one task wakes up another task and that task wakes up a third. Is the way the scheduler behaving causing this flow to be delayed? Can tooling show things like how migration or length of being preempted causes latency or throughput issues?
Perhaps having this analysis can help one decide what is the best scheduler algorithm for a given work flow? Come and share your ideas, and lets make Linux have the best scheduler than any other OS!
Modern systems generate massive amounts of trace data, which is valuable for performance analysis, debugging, and anomaly detection. However, existing analysis workflows, including popular visualization tools like Trace Compass, often require significant manual effort, domain expertise, and are not scalable for complex or large datasets. Analysts are often overwhelmed by the volume and complexity of trace data, and even with advanced filtering or aggregation, meaningful insights can remain hidden or require tedious, repetitive work.
This talk presents our research on the Trace Abstraction and Analysis Framework (TAAF), a new approach that integrates knowledge graphs and large language models (LLMs) to bridge the gap between raw trace data and actionable insight. TAAF enables users to interact with their trace data through natural language queries, reducing the need for deep domain expertise or manual, low-level exploration. The framework builds a time-indexed knowledge graph from trace events, capturing both structural and contextual information, such as interactions between threads, CPUs, and key system attributes. Generative AI models then use these knowledge graphs to answer a wide range of questions, from root-cause diagnosis to performance comparisons, delivering human-readable explanations.
We will present our methodology, key design choices, and evaluation results, and discuss real-world scenarios where TAAF reduced manual effort and improved analysis accuracy. Our experiments show that combining knowledge graphs with generative AI improves answer quality and accuracy compared to manual methods or raw data alone. We will demonstrate use cases such as identifying performance bottlenecks, tracing causal chains, and generating summaries for user queries, all without the need for coding. We also examine the strengths and limitations of LLMs and knowledge graphs in practical trace analysis. This talk will benefit industry practitioners who need faster and more accessible diagnostics, as well as academic researchers interested in automated analysis, interactive tooling, and new AI-based methods for system trace data.
QEMU offers full-system deterministic "record and replay" features. Using "rr" gdb stubs, it makes it possible to "time travel" to an arbitrary instruction in a record and to inspect the complete state of the emulated system.
Staying on top of a full-system time travel debugging session can be tedious due to the complexity of some records (across kernel space/user space and threads etc).
DejaView provides a set of QEMU plugins that generate high-level "birds eye view" traces out of a QEMU deterministic system record. Using Virtual Machine Introspection techniques, DejaView can be made operating system aware and understands concepts such as the name and pid of the current thread etc.
Additionally, DejaView comes with a fork of Perfetto that leverages the trace visualization as a QEMU and gdb controller. This allows for iterative trace building (starting with a thread trace, then adding a function call graph for a specific time slice, then an instruction trace for a specific function call etc) and iterative debugging (argument and memory introspection, stable pointers etc).
Finally, taking advantage of Perfetto's web-based UI, DejaView also provides a VSCode plugin that Integrates the trace recording and visualization in a broader IDE setup (with symbol lookup, easy recording etc)
We'll demonstrate the use of DejaView in the context of debugging Linux kernel crashes triggered by bug reproducers automatically extracted by syzkaller fuzzing while emphasizing the broader usefulness of the approach.
With large computing systems, traces are getting bigger and bigger and their analysis needs to scale accordingly. Trace analysis mainly necessitates deriving state information from trace events. These states need to be saved and displayed to the user at different zoom levels. With larger traces containing billions of events, an efficient framework is necessary to be able to move in time and display this information. Trace Compass is a trace analysis framework that is sufficiently modular so that different state stores can be tested on different trace formats. We developed two additional state storing mechanism in Trace Compass and compared it to the one currently used. In this presentation, we will analyze these implementations and show their scalability against different trace samples (GPU traces, kernel traces, function entry/exit traces).
In this talk, we explore how Eclipse Trace Compass, using userspace tracing events (e.g., via LTTng-UST or custom application instrumentation), can uncover these silent performance killers. We'll demonstrate how to visualize and correlate user-space events such as thread migrations, memory accesses, and scheduling delays to identify patterns symptomatic of cache-related inefficiencies. This will also discuss the trace-event-logger and when to use it. When to use semantically loaded tracers or free-form.
Attendees will gain practical strategies to know when to instrument, and diagnose their applications, and how to face a tricky issue that hides behind false positives in performance counters.
The Linux kernel and userspace provide numerous sources for performance data, including perf, ftrace, eBPF, and /proc interfaces. While effective individually, these sources are often siloed, requiring developers to write custom scripts to build a comprehensive view of system behaviour. This challenge is compounded by tools that prioritize the recording experience, often leaving visualization as a secondary concern.
Perfetto is an open-source suite of tools for performance analysis, featuring the Perfetto UI—an offline, web-based, timeline-oriented visualizer. A key design principle of the UI is its ability to analyze and visualize not only Perfetto's native format but also a range of widely used tracing formats, including Linux perf output, Firefox profiler data, and ftrace text logs.
This talk demonstrates how the Perfetto UI can be used to investigate complex performance problems, independent of the original data collection method. We will walk through several practical analysis scenarios:
- Analyze Linux
perfCPU profiles: diving into individual samples and using the UI's time-based selection and dynamic flamegraph viewer to pinpoint where CPU time is spent. - Investigate Linux scheduler traces (
sched_switch,sched_waking): to diagnose why an application may be latent even when its CPU usage is low. - Visualize custom user space trace data: specifically, instrumentation traces from a Rust program using the
tracingcrate to understand what a program was doing over time - Demonstrate the UI’s new superpower: merging multiple, disparate trace files onto a single timeline to create a holistic, multi-data source, system-wide view of performance.
- Examine community tools built on Perfetto: including
pthread_trace(for lock contention),systing(eBPF-based tracing),magic-trace(Intel Processor Trace), andviztracer(Python profiling).