Adding Help & Documentation

Chapter 4: Integrating Linux Commands with Click

Alfredo Deza

Learning Objectives

By the end of this chapter, you will be able to:

Master subprocess integration: Safely execute and capture output from system commands
Implement robust parsing strategies: Parse command output from simple text to complex formats
Apply security best practices: Avoid shell injection vulnerabilities and handle user input safely
Design resilient CLI tools: Handle command failures, timeouts, and edge cases gracefully

Prerequisites

Previous Chapter: Chapter 3 (IPython development workflow)
System Knowledge: Basic understanding of Linux/Unix command-line tools
Python Skills: Functions, error handling, and string manipulation

Chapter Overview

Estimated Time: 75 minutes
Hands-on Labs: 1 comprehensive system integration exercise
Assessment: 5-question knowledge check

Integrating system commands with Python CLI tools requires careful attention to security, error handling, and output parsing. This chapter covers professional-grade techniques for building robust command-line tools that interact seamlessly with the underlying system.

Python comes with lots of different utilities to interact with a system. Listing
directories, getting file information, and even lower-level operations like
socket communications. There are situations where these are not sufficient or
just not solving the right problem for us. I remember this time I worked with
Ceph (a distributed file storage system) and had to interact
with different disk utilities. There are quite a few tools to retrieve device
information like blkid, lsblk, and parted. They all have some overlap and
some distinct features. I had to retrieve some specific information from a
device that one tool wouldn't have and then go to a different tool to retrieve
the rest.

To make matters worse, and because Ceph supports different various Linux
distributions, some tools didn't have the features I needed on older versions of
a particular Linux distro. What a problem. I ended up creating utilities that
would try one tool first and then fall back to the others if they failed, with
an order of preference. The result, however, ended up being very robust and
resilient to all of these differences. Along the way, there are a few essential
pieces that need to be in place, though, and in this chapter, I go through these
pieces that make the interaction seamless, practical, and extraordinary
resiliency.

Understand subprocess

If you search on the internet for how to run a system command from Python, you
shouldn't be surprised to find hundreds (thousands?) of examples that may show
something like this one:

>>> import subprocess
>>> subprocess.call(['ls', '-l'])
total 13512
drwxrwxr-x    9 root  admin      288 Feb 11 13:13 itcl4.1.1
-rwxrwxr-x    1 root  admin  2752568 Dec 18 14:06 libcrypto.1.1.dylib
-rwxrwxr-x    1 root  admin    88244 Dec 18 14:07 libformw.5.dylib
-rwxrwxr-x    1 root  admin    43080 Dec 18 14:07 libmenuw.5.dylib
-rwxrwxr-x    1 root  admin   408344 Dec 18 14:07 libncursesw.5.dylib
-rwxrwxr-x    1 root  admin    25924 Dec 18 14:07 libpanelw.5.dylib
-rwxrwxr-x    1 root  admin   529676 Dec 18 14:06 libssl.1.1.dylib
-r-xrwxr-x    1 root  admin  1441716 Dec 18 14:06 libtcl8.6.dylib
drwxrwxr-x    5 root  admin      160 Dec 18 14:06 tcl8
drwxrwxr-x   17 root  admin      544 Feb 11 13:13 tcl8.6
-rw-rw-r--    1 root  admin     8275 Dec 18 14:06 tclConfig.sh
drwxrwxr-x    5 root  admin      160 Feb 11 13:13 thread2.8.2
-rw-rw-r--    1 root  admin     4351 Dec 18 14:07 tkConfig.sh
0

If the example is trying to capture the results and assign it to a variable, it
may use something like this though:

>>> from subprocess import Popen, PIPE
>>> process = subprocess.Popen(['ls', '-l'], stdout=PIPE)
>>> output = process.stdout.read()
>>> for line in output.decode('utf-8').split('\n'):
        print(line)
total 13512
drwxrwxr-x    9 root  admin      288 Feb 11 13:13 itcl4.1.1
-rwxrwxr-x    1 root  admin  2752568 Dec 18 14:06 libcrypto.1.1.dylib
-rwxrwxr-x    1 root  admin    88244 Dec 18 14:07 libformw.5.dylib
-rwxrwxr-x    1 root  admin    43080 Dec 18 14:07 libmenuw.5.dylib
-rwxrwxr-x    1 root  admin   408344 Dec 18 14:07 libncursesw.5.dylib
-rwxrwxr-x    1 root  admin    25924 Dec 18 14:07 libpanelw.5.dylib
-rwxrwxr-x    1 root  admin   529676 Dec 18 14:06 libssl.1.1.dylib
-r-xrwxr-x    1 root  admin  1441716 Dec 18 14:06 libtcl8.6.dylib
drwxrwxr-x    5 root  admin      160 Dec 18 14:06 tcl8
drwxrwxr-x   17 root  admin      544 Feb 11 13:13 tcl8.6
-rw-rw-r--    1 root  admin     8275 Dec 18 14:06 tclConfig.sh
drwxrwxr-x    5 root  admin      160 Feb 11 13:13 thread2.8.2
-rw-rw-r--    1 root  admin     4351 Dec 18 14:07 tkConfig.sh
0

The output
variablesaves the return string fromprocess.stdout.read(), then it gets decoded (read()`
returns bytes, not a string), and finally, it prints the result. This is not
very useful, except for demonstrating how to keep the output around for
processing.

These examples are everywhere. Some go further into checking exit status codes
and waiting for the command to finish, but they are lacking into crucial factors
of correctly (and safely) interacting with system commands. These are a few
questions that come to mind that need answering when crafting these types of
interactions:

What happens if the tool does not exist or is not in the path?
What to do when the tool has an error?
If the tool takes too long to run, how do I know if it is hanging or doing
actual work?

Running system commands looks easy, but it is vital to understand resilient
interfaces so that addressing failures is easier.

There are, primarily, two types of system calls you interact with: one doesn't
care about the output, like starting a web server, and the other one that
produces useful output that needs to be processed. There are strategies that you
need to apply to each one depending on the use case to ensure a transparent
interface. As long as there is consistency, these system interactions are easy
to work with.

Parsing Results

When the output of a system command gets saved for post-processing, then parsing
code must be implemented. The parsing can be as easy as checking if a specific
word is in the output as a whole. For more involved output, a line-by-line
parsing has to be crafted. The more difficult the output, the more chance there
is for brittleness in the handling. The parsing then needs to apply a strategy
from simple (more robust) to complex (prone to breakage). One common thought is
to apply a regular expression to anything coming out from a system command to
parse, but this is usually my last resort, as it is incredibly hard to debug
compared to non-regular-expression approaches.

If the tool you are calling from Python has a way to produce a machine-readable
format, then use it. It is almost always the best chance you have for an easy
path to parsing results. The machine-readable format can be many things, most
commonly it is JSON or CSV (Comma Separated
Values). Python has native support for loading these and interacting with them
easily.

I find that working directly with subprocess.Popen takes too much boilerplate
code, so I usually create a utility that runs a command and always returns the
stdout, stderr, and exit code. It looks like this:

import subprocess


def run(command):
    process = subprocess.Popen(
        command,
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE,
    )

    stdout_stream = process.stdout.read()
    stderr_stream = process.stderr.read()
    returncode = process.wait()
    if not isinstance(stdout_stream, str):
        stdout_stream = stdout_stream.decode('utf-8')
    if not isinstance(stderr_stream, str):
        stderr_stream = stderr_stream.decode('utf-8')
    stdout = stdout_stream.splitlines()
    stderr = stderr_stream.splitlines()

    return stdout, stderr, returncode

It accepts the command (as a list), then reads all the output produced by both
stdout and stderr and decodes them if necessary. The utility splits the
lines for processing them later, making it easier to consume the output.

Simple Parsing

The simplest parsing, using the run() example utility, is going to be checking
if there is a specific line or string in the output. For example, imagine you
need to check if a given device is using the XFS file system. A utility needs to
check if the reporting is mentioning that it is XFS, and nothing more. In my
system, I can do this with the following:

$ sudo blkid /dev/sda1
/dev/sda1: UUID="8ac075e3-1124-4bb6-bef7-a6811bf8b870" TYPE="xfs"

So all I need to do is check if TYPE="xfs" is in the output. The utility
becomes straightforward:

def is_xfs(device):
    stdout, stderr, out = run(['sudo', 'blkid', device])
    if 'TYPE="xfs"' in stdout:
        return True
    return False

Interacting with this utility is easy enough, including when there are errors
that the blkid tool reports but don't matter to determine if a device is using
the XFS filesystem:

>>> is_xfs('/dev/sda1')
 True
>>> is_xfs('/dev/sda2')
False
>>> is_xfs('/dev/sdfooobar')
False

In some cases, no parsing at all needs to happen because the exit code gives you
everything you need to know. That system call and its subsequent exit code
answers the question: "Was the command successful?". A quick example of this is
using the Docker command-line tool. Have you ever
tried stopping a container? It is pretty simple, first check if the container is
running:

$ docker ps | grep pytest
542818cd6d7f        anchore/inline-scan:latest   "docker-entrypoint.sh"   14 minutes ago      Up 14 minutes (healthy)   5000/tcp, 5432/tcp, 0.0.0.0:8228->8228/tcp   pytest_inline_scan

I've confirmed it is running so now I stop the pytest_inline_scan container,
and then check its exit code:

$ docker stop pytest_inline_scan
pytest_inline_scan
$ echo $?
0

The container is stopped, and its exit status code is 0. Although I highly
recommend using the
Python Docker SDK, you can use
this minimal example to guide you when you only need to check for an exit
status:

def stop_container(container):
    stdout, stderr, code = run(['docker', 'stop', container])
    if code != 0:
        raise RuntimeError(f'Unable to stop {container}')

Advanced Parsing

There are multiple levels of painful parsing for command-line output. Like I've
mentioned earlier in this chapter, you should always try to tackle the problem
with the simple approaches first, then move on to machine-readable output if
possible or just checking the exit code. This section dives into some of the
advanced parsing I've implemented in production where most of the time, I avoid
regular expressions until I've exhausted every other way, making it necessary.

Even though I highly recommend configuring tools to produce machine-readable
formats like CSV or JSON, sometimes this is not possible. One time, I saw that
the lsblk tool (another tool to inspect devices like blkid) had the --json
flag to produce an easily consumable output in Python. After creating the
implementation, I realized that this wouldn't work in older Linux distributions
because that special flag didn't exist, as it is a somewhat new feature. To have
one implementation that would work for older Linux versions as well as new ones,
I had to do the hard thing: parsing.

The first thing to do is to separate the implementation in two parts: one that
runs the command, and the other one that parses the output. This is crucial to
do because testing, maintaining, and fixing any problems is easier when the
pieces are isolated with lots of tests to ensure expected behavior. I first
start by running the command to produce the output I'm going to work within the
parser. I know what the command is, and how the flags should be issued, so I
want to verify the output before writing the parsing:

$ lsblk -P -p -o NAME,PARTLABEL,TYPE /dev/sda
NAME="/dev/sda" PARTLABEL="" TYPE="disk"
NAME="/dev/sda1" PARTLABEL="" TYPE="part"

The tool allows a little bit of machine-readable friendliness with the -P flag
which produces pairs of values, as if this output was going to be read by a BASH
script. The -p flag uses absolute paths for the output, and finally -o
specifies what device labels I'm interested in. With that output, the parsing
part can get started. I want the parsing to return a dictionary, so I need to
extract the label (NAME for example) as a key, and then the value that is
within the quotes. A good way to tinker with the parsing is to do this in the
Python shell:

>>> line = 'NAME="/dev/sda" PARTLABEL="" TYPE="disk"'
>>> line
'NAME="/dev/sda" PARTLABEL="" TYPE="disk"'
>>> line.split(' ')
['NAME="/dev/sda"', 'PARTLABEL=""', 'TYPE="disk"']
>>> line.split('" ')
['NAME="/dev/sda', 'PARTLABEL="', 'TYPE="disk"']

I try two ways of splitting the line, and I decide to use the last one that
splits on the double quote because it partially cleans up the value. These are
minor implementation details, and you can try other ways of splitting this that
can be more efficient. The important part is to separate the parsing from the
command execution function, add tests, and narrow down the path by playing with
the output in a Python shell.

Now that I'm happy with the splitting, I do another pass of splitting each item
to produce the pairs:

>>> for item in line.split('" '):
...     item.split('="')
...
['NAME', '/dev/sda']
['PARTLABEL', '']
['TYPE', 'disk"']
>>>

Very close. There is a trailing quote that needs cleaning in the last item, but
the good thing is that the items are now paired nicely, so the parsing doesn't
need to guess what value needs to go with what key. For example, if one value
was empty or if there are spaces and the splitting goes bad, it is easier
because of grouping. Let's try again by adding items into a dictionary and
cleaning up further:

>>> parsed = {}
>>> for item in line.split('" '):
...     key, value = item.split('="')
...     parsed[key] = value.strip('"')
...
>>> parsed
{'NAME': '/dev/sda', 'PARTLABEL': '', 'TYPE': 'disk'}

Excellent. The parsing side is complete, as I'm happy with the result in the
shell. Before moving forward, I write a dozen unit-tests to make sure I got this
right. However, I know you are thinking about doing this with regular
expressions, because that would be super easy to do, and you think I'm plain
silly in not writing a simple regular expression that would split on a group of
upper case letters. Regular expressions are not straightforward, and they are
hard to test (no if or else conditions, impossible to get a sense of
coverage). Let's try a couple of rounds of regular expressions to achieve the
same result.

First, and somewhat on cheating mode here, I split on whitespace:

>>> import re
>>> line = 'NAME="/dev/sda" PARTLABEL="" TYPE="disk"'
>>> line
'NAME="/dev/sda" PARTLABEL="" TYPE="disk"'
>>> re.split('\s+', line)
['NAME="/dev/sda"', 'PARTLABEL=""', 'TYPE="disk"']

I now have each part with its pairs. Many regular expressions can be thrown at
these items in the list to produce what we want. I'm the person at work that
tries the simplest approach first and ensure that it works great. Simple always
wins for me. Instead of splitting further, I use a regular expression to get rid
of the characters I don't need:

>>> for item in re.split('\s+', line):
...      re.sub('("|=)', ' ', item)
...
'NAME  /dev/sda '
'PARTLABEL   '
'TYPE  disk '

It looks broken without any delimiters or quotes as it was presented before, but
this is fine because in the next step, I split on whitespace which is the
fantastic default for the .split() string method:

>>> for item in re.split('\s+', line):
...      result = re.sub('("|=)', ' ', item)
...      result.split()
...
['NAME', '/dev/sda']
['PARTLABEL']
['TYPE', 'disk']

In this case, result is each string separated by whitespace that was produced
by replacing the characters I don't want. Then .split() removes that
whitespace creating the pairs. The final piece of code that produces the
dictionary mapping is now easy:

>>> parsed = {}
>>> for item in re.split('\s+', line):
...     result = re.sub('("|=)', ' ', item)
...     key, value = result.split()
...     parsed[key] = value
...
Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
ValueError: need more than 1 value to unpack

Oh no. What happened here? I was distracted and didn't realize that by replacing
the double quotes along with the equal character it caused some values to
disappear. That is why PARTLABEL="" produces just one item, not two. The fix
is to just remove the = character and clean up the quoted value later:

>>> for item in re.split('\s+', line):
...     result = re.sub('=', ' ', item)
...     key, value = result.split()
...     parsed[key] = value.strip('"')
...
>>> parsed
{'TYPE': 'disk', 'NAME': '/dev/sda', 'PARTLABEL': ''}

I mentioned I cheated because, in the end, I'm still trying to follow the
pattern of splitting and then doing further cleaning. Separating the parsing
process in this way is easier to understand and improve when output isn't
conforming to the expectations. This chapter doesn't cover the testing part of
this, but it is imperative to add as many tests as possible to ensure the
parsing is correct.

The end result of these approaches gives us a nice API to work with:

def _lsblk_parser(lines):
    parsed = {}
    for line in lines:
        for item in lines.split('" '):
            key, value = item.split('="')
            parsed[key] = value.strip('"')

    return parsed


def lsblk(device):
    command = [
        'lsblk',
        '-P',   # Produce pairs of key/value
        '-p',   # Return absolute paths
        '-o',   # Define the labels we are interested in
        'NAME,PARTLABEL,TYPE',
        device
    ]

    stdout, stderr, code = run(command)
    return _lsblk_parser(stdout)

And the resulting API interaction with the functions:

>>> lsblk('/dev/sda')
{'NAME': '/dev/sda1', 'PARTLABEL': '', 'TYPE': 'part'}
>>> lsblk('/dev/sda1')
{'NAME': '/dev/sda1', 'PARTLABEL': '', 'TYPE': 'part'}
>>> lsblk('/dev/sdb')
{}

Shell Safety

One of the things that prevented me from loving Python in the first place (I was
coming from BASH), was that subprocess.Popen defaults to accepting a list for
the command to run. This can get very tiring and repetitive quite fast, but it
is generally safer to use a list. It doesn't mean that using a plain string
(accepted if shell=True gets used in Popen) is inadequate or very unsafe. It
depends where this string is coming from. In all the examples in this chapter,
the input was curated and carefully constructed by the functions. Still, if the
interfaces in a command-line tool were to accept input from a user, then that is
a security concern. Python spawns a sub-shell to evaluate the input first,
expanding variables, for example, which can have undesired effects. In security,
this is called shell injection, where input can result in other arbitrary
execution.

Even in the case where you aren't accepting input from external sources, you are
at the mercy of the system and how it interprets (or expands) variables and
other behavior with a sub-shell.

In short: don't use shell=True to pass a whole string and always sanitize
input coming from the user.

Production Best Practices

When building production CLI tools that integrate system commands, consider these additional patterns:

Error Recovery Strategies

Implement fallback commands for different systems
Cache command results when appropriate
Provide graceful degradation when commands fail
Log errors for debugging and monitoring

Performance Optimization

Use asynchronous execution for parallel commands
Implement command result caching
Stream large outputs instead of loading everything into memory
Set appropriate timeouts for different command types

Testing System Integration

Mock subprocess calls in unit tests
Create integration tests with known command outputs
Test error conditions and edge cases
Validate parsing logic with various output formats

Chapter Summary

Integrating system commands with Click applications requires careful attention to security, reliability, and user experience. Key concepts covered include:

Subprocess Safety: Using lists instead of strings, avoiding shell injection vulnerabilities
Robust Error Handling: Managing command failures, timeouts, and missing dependencies
Output Parsing Strategies: From simple string matching to complex format parsing
Production Patterns: Implementing fallbacks, caching, and comprehensive testing

The techniques learned in this chapter enable you to build CLI tools that seamlessly bridge Python and system commands while maintaining security and reliability standards.

## Recommended Courses

🎓 Continue Your Learning Journey

Python Command Line Mastery

Master advanced Click patterns, testing strategies, and deployment techniques for production CLI tools.

Advanced Click decorators and context handling
Comprehensive CLI testing with pytest
Packaging and distribution best practices
Performance optimization for large-scale tools

View Course →

DevOps with Python

Learn to build automation tools, deployment scripts, and infrastructure management CLIs with Python.

Infrastructure automation with Python
Building deployment and monitoring tools
Integration with cloud platforms (AWS, GCP, Azure)
Real-world DevOps CLI examples

View Course →

Python Testing and Quality Assurance

Ensure your CLI tools are robust and reliable with comprehensive testing strategies.

Unit testing Click applications
Integration testing for CLI tools
Mocking external dependencies
Continuous integration for CLI projects

View Course →

📚 Related Learning Paths

### Chapter-Specific Resources - **Interactive CLI Applications**: Build TUIs with Python libraries like Rich and Textual - **Progress Bars and Spinners**: Enhance user experience with visual feedback - **Terminal Colors and Styling**: Make your CLI output beautiful and readable

📝 Test Your Knowledge: Adding Help & Documentation

Take this quiz to reinforce what you've learned in this chapter.