Karn aims to provide for Linux what entitlements provide for iOS, or what pledge provides for OpenBSD. It does this by translating a high level set of intuitive ‘entitlements’ into complex seccomp profiles.
For example, A developer using Karn can simply specify that their application needs to make network connections, or exec other processes and karn will handle granting it permission to do those things (and nothing else!).
So, how does it work?
Seccomp is a powerful security system inside the linux kernel that allows programmers to limit system call privileges for running procceses. This is useful because you can do some scary things via system calls like load a kernel module, install bpf programs, or reboot a machine.
Most container runtimes use seccomp as a way of limiting privilege by default. If a program running inside the container is potentially exploitable, seccomp is a useful line of defense, keeping attackers from doing serious damage.
Here’s an example of a seccomp configuration that you can use to block the
getcwd system call and allow all others:
You can run a container with the above profile. When a process tries to use the
getcwd syscall a seccomp filter will be run that prevents it:
Although containers have default profiles which grant all the permissions most applications will need, there are plenty of potential applications which will not be allowed. In those cases most people will instantly go for running their container with the
privileged flag, disabling seccomp, apparmor, SELinux, and granting full device access. This opens applications up to a whole slew of risks. It’s also an example of my most strongly held beliefs in security:
If it’s easier to disable a security control than it is to configure, it’s getting disabled.
For example, when was the last time you heard about someone configuring SELinux instead of turning it off when they got a violation?
The reason so few people write custom seccomp configurations instead of turning seccomp off is because it requires a ton of domain knowledge of system calls, and a lot of work to profile applications. There are hundreds of system calls, often they’re architecture specific, and different versions of the same library may use different versions of the same system call (i.e. fchownat vs fchown).
A common approach people could take is to trace and profile their applications to see what system calls it uses and generate a profile based on that (example). This however is fraught with difficulty and is fragile to the point of not actually solving the issue of stopping people from disabling seccomp.
The central philosophy that lead me to writing Karn is this: In order to have effective and sustainable security the operator has to strike a balance between usability and actual effectiveness.
With that in mind, let’s look at the design principles of Karn:
1. Use high level entitlements that match how a developer/user thinks about their application.
People think of software as needing to make network connections, not needing to
recvfrom. Karn accomplishes this with it’s custom written set of entitlements.
2. Let developers use Karn how they want to.
If the goal is to make seccomp as easy and accessible as possible then Karn shouldn’t change anyone’s workflow. It accomplishes this by generating OCI-compliant profiles for containers. It also provides simple libraries that you can use in your non-containerized programs to enforce seccomp rules at the start of your process execution. Currently libraries are available in both C and Go with more languages to come.
3. Denylist instead of allowlist.
This could be a controversial one. The seccomp man page and community typically encourages creating an allowlist. This means specifying what system calls are allowed, and denying all others by default. This would protect you in the case that a new system call is introduced in a newer kernel which is potentially dangerous.
However, I feel this is too much of an ask for most users. The lift to profile an application is heavy, and fragile. Non-profiling techniques include static analysis which isn’t effective.
Missing a needed system call is way too common of an occurrence. If the generated profile breaks user’s applications they aren’t going to give Karn another chance. The added benefit from denying large swaths of dangerous system calls is so great that it’s worth the small risk of missing one if it’ll actually be used.
For more hands on documentation check out the quickstart over on github. Feel free to reach out on twitter to discuss, or if i’ve convinced you to give Karn a try I encourage you to create issues or make a contribution! Above all, I hope you keep in mind the balance worth striking between effective and usable security.