To see our schedule with full functionality, like timezone conversion and personal scheduling, please enable JavaScript and go here.
09:00
09:00
60min
Registration, Coffee and Snacks
Bonjour 50
10:00
10:00
15min
Introduction
Bonjour 50
10:15
10:15
30min
Tracepoint Factory: Automated LTTng Instrumentation & Babeltrace2 Plugin Generation from header files
Thomas Applencourt

Generating and maintaining thousands of LTTng tracepoints for an existing C API—and then unpacking them with Babeltrace2—can quickly become tedious and error‑prone. In this talk, we’ll introduce two complementary open source tools that automate the end‑to‑end process:

  • h2yaml: A Clang‑based Python utility that parses your C header files and emits a human‑readable YAML description used to generate futures tracepoint (names, arguments, types, etc.). Under the hood, it use libclang’s python binding undocumented idiosyncrasies.

  • Metababel: A Babeltrace2 plugin generator that consumes the YAML manifest produced by h2yaml and generates the corresponding unpacking code. Making writing Babeltrace plugins a breeze

We’ll demonstrate both tools in action—using CUDA, MPI, and Intel Level Zero as concrete examples—and show how this workflow reduces development time, eliminates copy‑paste errors, and keeps your trace instrumentation, and bt2 sync in sync with API changes.

Bonjour 50
10:45
10:45
30min
Low-overhead Trace Collection for GPU Compute Kernels
Sébastien Darche

Graphics Computing Units (GPUs) have evolved in the past decades to become a key element of many computing systems, due to their affordable raw computing power. However, they require intricate knowledge of their behavior to achieve the best performance their performance can offer. GPU developers usually rely on tools to analyze the performance of their code, but existing tools fall short as they are unable to provide detailed data from the device.

In this presentation, we explore tracing methods for GPUs compute kernels. We first discuss implementation methods for efficient tracing that alleviate tracing challenges. We then present possible tracing schemes for GPUs and study their performance on a GPU-accelerated computing benchmark. Lastly, we discuss the instrumentation challenges of SIMT code and how to best place tracepoints.

Bonjour 50
11:15
11:15
15min
Coffee Break
Bonjour 50
11:30
11:30
30min
LTTng Ecosystem Update
Mathieu Desnoyers

Present recent features, ongoing development, and upcoming releases roadmap of the LTTng project.

Bonjour 50
12:00
12:00
105min
Lunch (Not provided)
Bonjour 50
13:45
13:45
30min
Robust Lock-Free Ring-Buffer Protocol over Shared Memory
Olivier Dion

This presentation outlines an augmented lock‑free transaction protocol for LTTng-UST shared‑memory ring buffers that resolves long‑standing robustness problems when a producer stalls during a transaction (e.g., at breakpoints or when receiving a termination signal). By extending the transaction protocol with a sub-buffer ownership semantic and allowing the consumer daemon to safely abort or recover from incomplete transactions, the updated protocol prevents loss of events and eliminates situations where a destroyed session becomes stuck, waiting indefinitely for a stalled producer to finish its transaction, while preserving the high‑performance lock‑free nature of the original design. Extensive reproducible testing confirms the correctness of the protocol. The upcoming LTTng 2.15 release will integrate these improvements, delivering a more reliable and adaptable LTTng ecosystem.

Bonjour 50
14:15
14:15
30min
Deferred stack traces, how they work and the issues they have
Steven Rostedt

A new method of creating user space stack tracing is being developed in the kernel that does not rely on frame pointers. This method is called SFrames. SFrames is a section in the ELF executable that holds two tables that can be used to look up the current instruction pointer in the first table (via a binary search) which takes you to the second table that tells the kernel how to find the return address for the current function. Then that address can be used to lookup the return address of that function creating a stack trace.

For the kernel to have access to these tables, which live in user space, it has to be able to handle a major page fault. That is because the table may still be on disk, and to read it, it has to be read into memory via the page fault handler. There's only two locations that generic code in the kernel can know if it's safe to handle a major page fault and that is when the task is entering or exiting the kernel.

As profilers usually trigger interrupts or NMIs to get the stack trace, including the kernel stack, it must defer the reading of the SFrames until the task goes back to user space. This talk will discuss the proposed implementation, the issues with handling deferred stack traces, and other various aspects that this new feature will provide.

Bonjour 50
14:45
14:45
15min
Coffee Break
Bonjour 50
15:00
15:00
15min
Development of an AXI Tracing Solution for FPGAs on Heterogeneous Systems
Nicolas Deloumeau, Francois Tetreault

Tracing FPGAs in heterogeneous systems remains a significant challenge. Vendor-provided hardware tracing tools often have limited trace durations, which restrict visibility into complex and long-running issues. When combined with the inherent complexity of heterogeneous platforms, this limitation makes debugging and performance analysis especially difficult. Tracing software and hardware separately further complicates matters, as aligning software events with FPGA logic becomes a tedious and error-prone task, leaving designers without a clear understanding of system-wide behaviors. This presentation introduces a solution that bridges this gap: a combined tracing approach that integrates FPGA and software event traces into a unified view. Our work comprises an FPGA core capable of tracing AXI transactions, signals, and state machines, along with a software component that collects and processes the trace data. The FPGA core timestamps events and transmits them over DMA to software, where they are stored with synchronized time markers for post-processing. The result is a cohesive, time-aligned trace that can be analyzed in conjunction with software traces, offering designers deeper insight into system interactions and making root cause analysis more accessible.

Bonjour 50
15:15
15:15
105min
Unconference
Bonjour 50
09:00
09:00
60min
Coffee and Snacks
Bonjour 50
10:00
10:00
15min
Introduction
Bonjour 50
10:15
10:15
30min
Analyzing scheduler traces
Steven Rostedt

The Linux kernel has a mature robust tracing infrastructure. Several in fact. The focus today is not what to trace, but how to look at what has been traced.

There's several benchmarks that test the Linux kernel scheduler. Changing the scheduler algorithm can make one benchmark perform better while making another benchmark perform worse. These benchmarks only tell you how one scheduler compares to another scheduler but it does not give you any insight into why.

This session is a discussion on what analysis can be done from tracing the scheduler. This is more than just taking statistics, but looking at the flow of events as well. What tooling can be added where it measures the work flow of tasks and how the scheduler affects it. For instance, when one task wakes up another task and that task wakes up a third. Is the way the scheduler behaving causing this flow to be delayed? Can tooling show things like how migration or length of being preempted causes latency or throughput issues?

Perhaps having this analysis can help one decide what is the best scheduler algorithm for a given work flow? Come and share your ideas, and lets make Linux have the best scheduler than any other OS!

Bonjour 50
10:45
10:45
30min
TAAF: A Knowledge Graph and LLM-Driven Framework for Trace Abstraction and Analysis
Alireza Ezaz, Naser Ezzati-Jivan

Modern systems generate massive amounts of trace data, which is valuable for performance analysis, debugging, and anomaly detection. However, existing analysis workflows, including popular visualization tools like Trace Compass, often require significant manual effort, domain expertise, and are not scalable for complex or large datasets. Analysts are often overwhelmed by the volume and complexity of trace data, and even with advanced filtering or aggregation, meaningful insights can remain hidden or require tedious, repetitive work.

This talk presents our research on the Trace Abstraction and Analysis Framework (TAAF), a new approach that integrates knowledge graphs and large language models (LLMs) to bridge the gap between raw trace data and actionable insight. TAAF enables users to interact with their trace data through natural language queries, reducing the need for deep domain expertise or manual, low-level exploration. The framework builds a time-indexed knowledge graph from trace events, capturing both structural and contextual information, such as interactions between threads, CPUs, and key system attributes. Generative AI models then use these knowledge graphs to answer a wide range of questions, from root-cause diagnosis to performance comparisons, delivering human-readable explanations.

We will present our methodology, key design choices, and evaluation results, and discuss real-world scenarios where TAAF reduced manual effort and improved analysis accuracy. Our experiments show that combining knowledge graphs with generative AI improves answer quality and accuracy compared to manual methods or raw data alone. We will demonstrate use cases such as identifying performance bottlenecks, tracing causal chains, and generating summaries for user queries, all without the need for coding. We also examine the strengths and limitations of LLMs and knowledge graphs in practical trace analysis. This talk will benefit industry practitioners who need faster and more accessible diagnostics, as well as academic researchers interested in automated analysis, interactive tooling, and new AI-based methods for system trace data.

Bonjour 50
11:15
11:15
15min
Coffee Break
Bonjour 50
11:30
11:30
30min
DejaView: time-travel debugging and tracing
Florent Revest

QEMU offers full-system deterministic "record and replay" features. Using "rr" gdb stubs, it makes it possible to "time travel" to an arbitrary instruction in a record and to inspect the complete state of the emulated system.

Staying on top of a full-system time travel debugging session can be tedious due to the complexity of some records (across kernel space/user space and threads etc).

DejaView provides a set of QEMU plugins that generate high-level "birds eye view" traces out of a QEMU deterministic system record. Using Virtual Machine Introspection techniques, DejaView can be made operating system aware and understands concepts such as the name and pid of the current thread etc.

Additionally, DejaView comes with a fork of Perfetto that leverages the trace visualization as a QEMU and gdb controller. This allows for iterative trace building (starting with a thread trace, then adding a function call graph for a specific time slice, then an instruction trace for a specific function call etc) and iterative debugging (argument and memory introspection, stable pointers etc).

Finally, taking advantage of Perfetto's web-based UI, DejaView also provides a VSCode plugin that Integrates the trace recording and visualization in a broader IDE setup (with symbol lookup, easy recording etc)

We'll demonstrate the use of DejaView in the context of debugging Linux kernel crashes triggered by bug reproducers automatically extracted by syzkaller fuzzing while emphasizing the broader usefulness of the approach.

Bonjour 50
12:00
12:00
105min
Lunch (Not provided)
Bonjour 50
13:45
13:45
30min
Scalable Trace Analysis Framework with Trace Compass
Arnaud Fiorini

With large computing systems, traces are getting bigger and bigger and their analysis needs to scale accordingly. Trace analysis mainly necessitates deriving state information from trace events. These states need to be saved and displayed to the user at different zoom levels. With larger traces containing billions of events, an efficient framework is necessary to be able to move in time and display this information. Trace Compass is a trace analysis framework that is sufficiently modular so that different state stores can be tested on different trace formats. We developed two additional state storing mechanism in Trace Compass and compared it to the one currently used. In this presentation, we will analyze these implementations and show their scalability against different trace samples (GPU traces, kernel traces, function entry/exit traces).

Bonjour 50
14:15
14:15
30min
Cache Me If You Can: Diagnosing Cache Allocation Issues with Eclipse Trace Compass
Matthew Khouzam

In this talk, we explore how Eclipse Trace Compass, using userspace tracing events (e.g., via LTTng-UST or custom application instrumentation), can uncover these silent performance killers. We'll demonstrate how to visualize and correlate user-space events such as thread migrations, memory accesses, and scheduling delays to identify patterns symptomatic of cache-related inefficiencies. This will also discuss the trace-event-logger and when to use it. When to use semantically loaded tracers or free-form.

Attendees will gain practical strategies to know when to instrument, and diagnose their applications, and how to face a tricky issue that hides behind false positives in performance counters.

Bonjour 50
14:45
14:45
15min
Coffee Break
Bonjour 50
15:00
15:00
30min
Perfetto: The Swiss Army Knife of Linux Client/Embedded Tracing
Lalit Maganti

The Linux kernel and userspace provide numerous sources for performance data, including perf, ftrace, eBPF, and /proc interfaces. While effective individually, these sources are often siloed, requiring developers to write custom scripts to build a comprehensive view of system behaviour. This challenge is compounded by tools that prioritize the recording experience, often leaving visualization as a secondary concern.

Perfetto is an open-source suite of tools for performance analysis, featuring the Perfetto UI—an offline, web-based, timeline-oriented visualizer. A key design principle of the UI is its ability to analyze and visualize not only Perfetto's native format but also a range of widely used tracing formats, including Linux perf output, Firefox profiler data, and ftrace text logs.

This talk demonstrates how the Perfetto UI can be used to investigate complex performance problems, independent of the original data collection method. We will walk through several practical analysis scenarios:

  • Analyze Linux perf CPU profiles: diving into individual samples and using the UI's time-based selection and dynamic flamegraph viewer to pinpoint where CPU time is spent.
  • Investigate Linux scheduler traces (sched_switch, sched_waking): to diagnose why an application may be latent even when its CPU usage is low.
  • Visualize custom user space trace data: specifically, instrumentation traces from a Rust program using the tracing crate to understand what a program was doing over time
  • Demonstrate the UI’s new superpower: merging multiple, disparate trace files onto a single timeline to create a holistic, multi-data source, system-wide view of performance.
  • Examine community tools built on Perfetto: including pthread_trace(for lock contention), systing (eBPF-based tracing), magic-trace (Intel Processor Trace), and viztracer (Python profiling).
Bonjour 50
15:30
15:30
15min
Conclusion
Bonjour 50
15:45
15:45
75min
Unconference
Bonjour 50