Change Language

SONiC Bug Report: Systemd Generators – Dynamic Service Masking

published_on
7 mins read
--- views
Some Context

This post is part of the SONiC Bug Report series, documenting some of the tougher issues I solved for SONiC.

Related Issues:

Related Pull Request:

I was working off of the following commit:

All Github links in the blog are permalinks from the above commit.

Log Spam on Warm Reboot

I recently got interested in the SONiC project, and have been digging into the sonic-buildimage repository to understand how the system initializes. That was when I ran into issue #25091, with the console log showing concerning systemd failure messages on warm-reboot.

Sep 03 19:12:42 sonic systemd[1]: systemd-networkd-persistent-storage.service: Job systemd-networkd-persistent-storage.service/st
Sep 03 19:32:00 sonic systemd[1]: systemd-networkd-persistent-storage.service: Bound to unit systemd-networkd.service, but unit i 
Sep 03 19:32:00 sonic systemd[1]: Dependency failed for systemd-networkd-persistent-storage.service - Enable Persistent Storage i 
Sep 03 19:32:00 sonic systemd[1]: systemd-networkd-persistent-storage.service: Job systemd-networkd-persistent-storage.service/st 
Sep 03 20:20:58 sonic systemd[1]: systemd-networkd-persistent-storage.service: Bound to unit systemd-networkd.service, but unit i 
Sep 03 20:20:58 sonic systemd[1]: Dependency failed for systemd-networkd-persistent-storage.service - Enable Persistent Storage i 
Sep 03 20:20:58 sonic systemd[1]: systemd-networkd-persistent-storage.service: Job systemd-networkd-persistent-storage.service/st
...............

The source of the error seems to be a networkd related service, which should be run on the "smart switches" with specialized compute hardware like NPU or DPU to handle more complex tasks. It should not be running on the normal non-smart switches.

Attempt 1: Block via ExecCondition

This doesn't seem like a very tough problem to solve, mainly because there is existing pattern to follow - the systemd-networkd.service itself as the missing dependency is limited by systemd-networkd.override.conf:

[Service]
ExecCondition=/bin/bash /usr/local/bin/is-npu-or-dpu.sh
ExecStartPre=+/bin/bash /usr/local/bin/define-npu-specific-netdevs.sh

The bash script is-npu-or-dpu.sh does what its name suggests. Setting it as the ExecCondition shuts down systemd-networkd if it is not running on a smart switch with no NPUs nor DPUs.

A similar pattern can be applied to systemd-networkd-persistent-storage, so the issue can probably be solved by defining a similar systemd-networkd-persistent-storage.override.conf and skipping the service conditionally.

Unfortunately, that did not work. The same dependency failure messages still persist.

This is because ExecCondition is checked right before execution starts (technically, right before ExecStartPre). However, that's too slow to stop depencency related errors, as resolving dependencies happens quite a while ago. The service enters a failed state on dependency check, triggering the alarm before we have a chance to check the ExecCondition at all.

Attempt 2: Resolve at Build Time

Now that we have learned ExecCondition is too late. What is the earliest possible time to check and resolve the problem? Build time! If we remove networkd related services during build, all problems are solved.

However, that solution is also not valid. At build time, all we can see are platforms, and the same platform may contain both smart switches and non-smart switches. The mellanox platform, for instance, contains both smart switches like SN4280 with DPU and non-smart switches like MSN4700 with no DPU nor NPU. However, if we look at platform/mellanox/one-image.mk, we can see there is only one image file: sonic-mellanox.bin.

In retrospect, this makes sense. If it is possible to resolve this issue during build time, we wouldn't need bash scripts like is-npu-or-dpu.sh to detect smart switches during runtime.

Attempt 3: Mask with Generator

Now we are forced to resolve the issue at run-time, but the goal remains to bestill want to resolve the issue as early as possible. Since we're working with systemd, generators immediately jumps to mind. It is one of the earliest processes to run during run-time: Just late enough to have run-time information to determine whether the machine is a smart switch, but also early enough to mask services long before we check dependencies for a specific service.

To my surprise, when I was looking at the generator code in systemd-sonic-generator.cpp, I noticed there is already logic detecting smart switches via looking at the platform json file. The detection is mainly used for setting up the bridge mid-plane network and other configurations for the smart switches.

static int render_network_service_for_smart_switch(const std::filesystem::path& install_dir) {
    if (!smart_switch_npu) {
        return 0;
    }

    // Render Before instruction for midplane network with database service
    for (int i = 0; i < num_dpus; i++) {
        auto unit_override_dir = install_dir / std::format("database@dpu{}.service.d", i);
        std::filesystem::create_directory(unit_override_dir);

        auto unit_ordering_file_path = unit_override_dir / "ordering.conf";

        std::ofstream unit_ordering_file;
        unit_ordering_file.open(unit_ordering_file_path);
        unit_ordering_file << "[Unit]\n";
        unit_ordering_file << "Requires=systemd-networkd-wait-online@bridge-midplane.service\n";
        unit_ordering_file << "After=systemd-networkd-wait-online@bridge-midplane.service\n";
    }

    return 0;
}

Since we already have a battle-tested implementation of smart switch detection logic, we can use a similar function to conditionally mask the failing service:

static int mask_networkd_persistent_storage_for_non_smart_switch(const std::filesystem::path& install_dir) {
    if (smart_switch) {
        return 0;
    }

    auto service_path = install_dir / "systemd-networkd-persistent-storage.service";

    int r = symlink("/dev/null", service_path.c_str());

    if (r < 0) {
        if (errno == EEXIST)
            return 0;
        log_to_kmsg("Error masking %s: %s\n", service_path.c_str(), strerror(errno));
        return -1;
    }

    return 0;
}

Note smart_switch is basically smart_switch_npu || smart_switch_dpu and functions similarly as is-npu-or-dpu.sh - except earlier.

The result? We have no more warnings from this specific service, so that's a win!

Source of the cascade

Although our patch solved the issue of systemd-networkd-persistent-storage.service, I noticed people are getting similar error messages with other networkd related units such as systemd-networkd-wait-online.service.

Additionally, systemd-networkd.socket starts on non-smart switches for some reason, blocking the sonic-mgmt testing pipeline in issue #24523. Since systemd-networkd.socket is not a service, it is not possible to add an ExecCondition to shut it down.

While masking the storage service silenced the immediate error, it felt like playing Whac-A-Mole. We can mask all the individual services and sockets, but that's not optimal considering new dependent services can be added or modified. It's just not sustainable to add another patch with every new networkd related error.

Up to this point, I've been assuming that there is some random snippet in the codebase that tries to activate a service regardless of the hardware on the machine. However, now there is a whole group of problematic units, and their ancestry all trace up to systemd-networkd.service, and it seems like there's a collective cause behind them.

To understand the source of the issue, we have to go back in the first failed attempt where we learned ExecCondition is way too slow - we only check it right before the service executes! By that time, all of its dependencies have been enqueued for execution. Even if ExecCondition fails for a service, it does not impact its enqueued dependencies.

Thus, the source of the issue is really the systemd-networkd.override.conf file using ExecCondition to conditionally shut systemd-networkd.service but leaves all of its dependencies running. These dependencies then wake up to see systemd-networkd.service down, decides to complain (or continuing to run inappropriately), thus causing an entire cascade of errors.

At this point, the solution is pretty clear. We can still use the masking approach, but we mask systemd-networkd.service instead of its dependencies:

static int mask_networkd_for_non_smart_switch(const std::filesystem::path& install_dir) {
    if (smart_switch) {
        return 0;
    }

    auto service_path = install_dir / "systemd-networkd.service";

    int r = symlink("/dev/null", service_path.c_str());

    if (r < 0) {
        if (errno == EEXIST)
            return 0;
        log_to_kmsg("Error masking %s: %s\n", service_path.c_str(), strerror(errno));
        return -1;
    }

    return 0;
}

And voilà! The issues are solved. This approach is also a bit cleaner since we are not marking it as failed on execution, which may trigger certain warnings or monitor services. Instead, we properly mark the service as skipped.

To conclude, this was an interesting lesson in systemd timing: sometimes ExecCondition isn't early enough when working with dependency errors. If you need to conditionally alter large services with complex dependency relationships at early runtime, Generators are probably a better tool for the job.