bpf-iotrace: Defining Requirements
by Mike Przybylski
Previously
Getting organized
Any software engineering task more complex than a trivial bug fix needs a checklist. This is especially true of new projects and new features.
The checklist can take any form that works most efficiently for a developer, their team, or their organization. JIRA issues, ClickUp tasks, or a text or Markdown file on your desktop are all viable options1, but writing a good checklist has an even more fundamental pre-requisite: good requirements.
Requirements describe a project’s desired end result. Good requirements are a set of concise and unambiguous statements that focus on what a project or product must do. Good requirements shouldn’t prescribe how a developer or team should implement its requirements. Those decisions should be left to the experts on the implementing team as much as possible.
You may be fortunate enough to work with a product or project manager or engineering lead who understands their problem space really well, who can just hand you a nice, cleanly written, ready-to-implement list of requirements. If not, don’t fret.
If you were given any written requirements at all, read them carefully, and do your best to get written clarification of any ambiguities or inconsistencies you find. Once you feel like you fully understand those requirements, you can break them down into a task list in the best way that works for you or your team.
If you were given a complex software project without any requirements, kick off your design process by writing requirements for yourself based on your understanding of the problem. Then check in with your stakeholders to see if you are on the right track. Once you are, you can confidently translate them to a task list as you move on to design and implementation. This approach also works when you are working solo and developing your own ideas.
I like to structure my requirements as much as I can like an IETF RFC. Thinking about what functionality a project must deliver in terms of RFC 2119 keywords has really helped me separate the “whats” of the deliverable from the “hows” of the implementation.
The working draft of bpf-iotrace
’s requirements may be found
here,
and they are discussed in greater detail below.
bpf-iotrace
requirements
Target operating system
Word-of-mouth suggests that Linux kernel version 4.14 is the oldest practical target for a BPF-based utility.
bpf-iotrace
will also be linked against GNU libc, (glibc), v2.26.
So any Linux host with that version of glibc or newer should be compatible.
Implementation languages
There is a moderate amount of coupling between a programming language and its target environment(s).
Therefore, the languages to be used on a software project are one of the very few “hows”
that it makes sense to specify as a requirement.
There may also be personnel, time,
or other resource constraints that dictate the selection of a particular programming language.
In the case of bpf-iotrace
,
one of its main goals is to serve as a demonstration project for libbpf-based application
and modern C++ development techniques.
So bpf-iotrace
will be written in C++, C, and eBPF assembly.
Output
bpf-iotrace
should be able to take an application that is essentially a black box and see where and how it is reading and writing data. Its output must allow administrators and dev-ops engineers to create filesystems with optimal configurations for hosting that application. bpf-iotrace
should also provide the insights necessary to back those filesystems with the most cost-effective storage technologies. Its output must also inform benchmarks that can rapidly qualify those file systems.
Given that most applications and especially database servers and NoSQL data stores often interact with large directory trees,
bpf-iotrace
’s output must be formatted to allow analysis of I/O data for a top-level directory,
an individual file, and everything in between.2
This means bpf-iotrace
must save its metrics on a per-file basis, but containers add an extra wrinkle: how does bpf-iotrace
reliably identify files in different containers when the file has the same name, (i.e. ib_logfile0
in two separate MySQL Sever containers on the same Docker host or Kube node)? This means bpf-iotrace
must use the mount namespace ID, (i.e. readlink /proc/<pid>/ns/mnt
), and an absolute path to uniquely identify a file.
Per-file metrics
- Since
bpf-iotrace
will be used for troubleshooting, it shall record an error count for each I/O system call. - A FIO benchmark can be configured to generate a distribution of I/O sizes in a variety of system calls, so
bpf-iotrace
shall record a histogram of size arguments and return values, (usually the number of bytes read or written), for each I/O system call. bpf-iotrace
shall record a histogram of sequence lengths when it identifies runs of sequential read or write calls.bpf-iotrace
shall also record the following data on a per-file basis:- Number of times the file was opened
- Number of times the file was closed
- Number of errors on a per-system-call basis
- A histogram of the number of bytes written between
fsync()
calls - Total bytes read
- The entry time for the first read system call
- The return time for the last successful read system call
- Total bytes written
- The entry time for the first write system call
- The return time for the last successful write system call
- Total bytes written
- Total bytes written before the first
fsync()
system call iffsync()
has only been called once on the file descriptor, or between the two most recentfsync()
calls, iffsync()
has been called more than once. - The return time of the most recent
fsync()
system call - The entry time of the first write system call
- The return time of the last successful write system call
- The number of times a write operation was issued for the same file offset as a previous write
- A histogram of
fsync()
latencies
The above metrics should allow a user to characterize an application’s I/O including calculating read and write bandwidths for a single file or an entire directory tree.
Instrumentation
bpf-iotrace
shall include BPF instrumentation for the following system calls:
close()
fsync()
sendfile()
pread64()
preadv()
preadv2()
read()
readv()
pwrite64()
pwritev()
pwritev2()
write()
writev()
bpf-iotrace
shall also include trace point and/or kprobe instrumentation necessary to track asynchronous I/O operations.
The list above is intended to capture every I/O function an application might use to read from or write to a filesystem. If you think it is missing anything, please let me know here.
Filtering
Another interesting feature of BPF programs is that they will execute any and every time the trace point or kernel function they were attached to is encountered. This can be incredibly powerful in terms of visibility, but it can also be incredibly noisy. In the case of read()
, readv()
, write()
, writev()
, and sendfile()
, those functions can interact with regular files, or file descriptors for sockets, (network I/O). At the very least, bpf-iotrace()
instrumentation must ignore network I/O all the time. More generally, bpf-iotrace
must only trace I/O operations on named, regular files.
Typically, a performance or dev-ops engineer is only going to be interested in the behavior of a small subset of commands or process IDs running on a system. Therefore, bpf-iotrace
must provide options that allow a user to specify a set of process IDs or commands to be monitored exclusively.
TUI reporting
I am thinking of adding text-based user interface, (TUI), reporting and analysis to bpf-iotrace
as a future enhancement. If you have a wish list for functionality that you would like it to include, please let me know here.
Time series I/O metrics
I am thinking of adding time-series file I/O metrics to bpf-iotrace
as a future enhancement. If you have ideas or requests for lightweight ways to record that data, and the specific time series metrics you would like to see included, please let me know here.
Up next
Creating a Build Environment for libbpf-based Programs
References
- https://fio.readthedocs.io/en/latest/fio_doc.html
- https://www.ietf.org/standards/rfcs/
- https://www.ietf.org/rfc/rfc2119.txt
Footnotes
-
I’m well aware that these aren’t the only options for organizing a complex project, and if you have another favorite, let me know here ↩
-
Note that we are not specifying the low-level format for
bpf-iotrace
’s output file(s). Since there are no external constraints like a customer requirement or a downstream platform consumingbpf-iotrace
’s data, we can leave it up to the implementors to select a format that supports the analysis requirements mentioned above. If there were customer or consumer constraints, then it would be perfectly appropriate to specify them as requirements. ↩