Securing AI on Untrusted Infrastructure With Kata Containers & Confidential Computing
- Steve Younger

- Dec 1
- 14 min read

Artificial intelligence workloads are becoming some of the most valuable assets an organization runs. Models capture years of research and tuning. Datasets encode customer behavior, health information, or financial risk profiles. At the same time, more of that compute is running on infrastructure you do not fully control, whether that is a public cloud, a partner data center, or a cluster of GPUs at the edge sitting in a closet in a retail store.
Traditional security has mostly focused on two places. Data at rest lives on disk and can be protected with volume or database encryption. Data in transit moves over the network and can be protected with TLS and VPNs. The gap is data in use. Once data is loaded into memory and a model is running on it, anyone who can get a foothold on the host has a real chance of inspecting or tampering with it. For AI workloads that are often GPU accelerated and multi tenant, that risk multiplies quickly.
Because of that gap, many organizations in regulated industries have traditionally kept their most sensitive AI workloads on premises. Banks, healthcare providers, governments, and defense organizations want cloud scale, but not at the cost of losing control over model weights or regulated data. They need something that lets them use untrusted infrastructure while still treating their data and models as confidential.
This is where Kata Containers and confidential computing come together. Kata Containers gives each workload a dedicated lightweight virtual machine that behaves like a container from a Kubernetes point of view. Confidential computing adds hardware backed encryption and integrity for that virtual machine while it is running. Combined, they let you run AI workloads on Kubernetes in a way that assumes the infrastructure is untrusted but still keeps the workload protected.
In this article, we will walk through what Kata Containers actually do, how confidential computing works at the hardware level, how confidential containers fit into Kubernetes, what it looks like to attach GPUs, and where the trade offs and real world use cases are. The goal is to give you both a clear mental model and enough technical depth to decide where this pattern fits in your own stack.
Rethinking isolation with Kata Containers

A normal Kubernetes pod runs one or more containers that all share the host kernel. Linux namespaces and cgroups create the illusion of isolation, but in reality there is one kernel underneath everything. If an attacker manages a container breakout or finds a kernel vulnerability, their path to the underlying node and other workloads is short.
Kata Containers approaches that problem differently. Instead of running a container directly on the host kernel, Kata starts a small virtual machine for the pod and runs the container inside that VM. Each pod gets its own guest kernel and virtualized hardware. From Kubernetes’ perspective, you still see pods, images, and the usual scheduling model. Under the covers, each of those pods lives inside a microVM.
Think of it as giving each pod its own tiny, purpose built virtual machine that boots a stripped down kernel and user space tuned for containers. The guest image is minimal so startup time is still low, especially compared to traditional full operating system VMs. The payoff is that the isolation boundary moves from “processes sharing a kernel” to “separate virtual machines enforced by hardware virtualization.”
If someone compromises a workload inside a Kata pod, they are now trapped inside that pod’s VM boundary. Escaping to the host or another pod forces them to break through the hypervisor as well as the guest kernel. That layered isolation is much harder to bypass.
Kubernetes integrates with Kata through RuntimeClass. You define a runtime class for Kata and reference it in your pod specs when you want that stronger isolation. Developers can keep using the same container images, manifests, and GitOps pipelines. Operations teams can decide which namespaces or workload types should run with Kata, and the scheduler takes care of the rest.
On its own, Kata already improves the story for multi-tenant isolation and blast radius reduction. It does not yet protect the workload from the host. That is where confidential computing comes in.

Tech Stacks Tip Callout:
Same Pod Spec, Stronger Wall
You do not need a special image to use Kata Containers. The same container image can run as a traditional pod or inside a Kata microVM. The change happens in the runtimeClass and node configuration, not in your Dockerfile.
How confidential computing protects data in use
Confidential computing flips the trust problem around. Kata makes it harder for a compromised workload to attack the node. Confidential computing protects the workload from a potentially compromised or curious node.
At a high level, confidential computing uses hardware support in modern processors to create a trusted execution environment. A TEE is an isolated region of memory where code and data remain encrypted to everything outside that environment. The CPU decrypts and re encrypts data on the fly as instructions execute, but the hypervisor, host operating system, and other processes only ever see ciphertext.

Tech Stacks Tip Callout:
Encrypted Disk Is Not Enough
Disk encryption protects your data when the VM is powered off or the volume is detached. Confidential computing protects your data while the CPU is actively using it. If an attacker can read memory from the host, disk encryption alone will not stop them.
Different vendors implement this in different ways. The details vary, but the pattern is similar. You create a confidential virtual machine or enclave. The CPU sets up page tables and memory regions such that all of the VM’s memory is encrypted and integrity protected. Memory controllers include an encryption engine, so any data leaving the CPU package gets encrypted automatically. If someone snapshots the memory from the hypervisor or physically probes the DIMMs, the result is an unreadable blob.

Confidential computing also gives you a way to prove what is running inside that protected environment. This is called remote attestation. The processor can produce a signed report that describes the identity of the code and configuration that booted inside the TEE, along with platform details such as firmware versions and whether confidential mode is actually enabled. An external attestation service verifies that report against hardware root keys and an allow list of expected measurements.
Only after that verification passes should sensitive material flow into the environment. A key management system or secrets manager can require a valid attestation report before releasing decryption keys, credentials, or model artifacts. From an application point of view, it feels like “call an API, present your attestation token, and get back the secret.” Underneath, you have turned the CPU into an active participant in the trust chain, rather than treating it as a black box.
The result is that your workload can run on someone else’s hardware while still treating that hardware as untrusted. The cloud provider, the hypervisor, and even a privileged host user cannot look into the memory of the confidential VM or silently modify it without detection.
Confidential Containers: Combining Kata and trusted execution
When you combine Kata Containers with confidential computing, you get confidential containers. The idea is straightforward. Kubernetes still schedules a pod. The container runtime still receives a request to start that pod. Instead of launching a regular container or a regular Kata VM, the runtime creates a confidential virtual machine and runs the pod inside it.
In practice, a few extra moving parts show up. Nodes that support confidential containers need CPUs with TEE features enabled, firmware configured correctly, and the Kata runtime installed with a guest image that is built to run as a confidential VM. Those nodes are usually labeled so the scheduler knows which workloads can land there.

When you deploy a pod that requests the confidential runtime, the sequence looks roughly like this. The scheduler places the pod on a node that is flagged as capable of running confidential VMs. The container runtime uses Kata to create a microVM for the pod, but with confidential mode turned on so the VM’s memory region is encrypted and isolated by the CPU. An attestation agent inside that VM generates an attestation report and sends it to an external attestation service. If the report checks out, the attestation service or an integrated key manager releases the secrets the application needs, such as decryption keys for models or data. Only then does the main container process start and begin actual work.

Tech Stacks Tip Callout:
No Attestation, No Secrets
Treat attestation as a hard gate. The confidential VM must prove its identity and configuration before your key manager releases model weights, API keys, or decryption keys. If attestation fails, the safest behavior is to let the pod start but starve it of secrets so it fails closed instead of failing open.
If any part of that attestation flow fails, the workload should not receive its secrets. In many designs, the pod will still come up, but it will be unable to decrypt its model or reach critical backend services, which effectively makes it safe to fail. This pattern aligns well with the idea of “do not trust the platform unless it can prove its state and hardware identity.”
From a lifecycle perspective, the confidential VM exists only as long as the pod does. When you delete the pod or the controller scales it down, the confidential VM is torn down as well. That ephemerality is important because it reduces the chance that residual data lingers in memory or on disk after the workload finishes.
The CNCF Confidential Containers project (often shortened to CoCo) pulls these pieces together into a common architecture and set of components. It provides the Kubernetes integrations, runtime configuration, and attestation plumbing needed to make confidential containers behave like a first class citizen in a cluster, rather than a one off science project.
GPU acceleration inside Confidential Containers
Most real AI workloads depend on GPUs or other accelerators. Protecting CPU memory is not enough if your model or intermediate tensors spend most of their time on the GPU. Historically, GPUs were designed primarily for throughput, not for strict multi-tenant isolation. That has made them a weak spot in security discussions.
There are a few problems to solve. GPU drivers often run with high privilege on the host, which increases the attack surface if they contain bugs. GPU memory can retain data between workloads if it is not scrubbed correctly. Traditional GPU sharing approaches slice one physical GPU across multiple tenants, which is convenient for utilization but difficult to secure to the same standard as a TEE.
The most practical approach today for confidential containers is to attach GPUs directly to the Kata VM using PCI passthrough or a similar mechanism. Instead of a host level driver managing all tenant workloads, the confidential VM owns the device. The driver stack runs inside the VM, and the GPU’s memory is addressable only from that guest context.
This has several implications. First, performance is very close to bare metal. The VM is talking to the GPU at the hardware level, so there is no extra software translation layer on the hot path. Second, the trust boundary is easier to reason about. The GPU is dedicated to that VM, and the large, complex driver lives inside the protected environment rather than on the host.
The obvious trade off is sharing. If you assign a full GPU to each confidential VM, you sacrifice some utilization flexibility. Today, most confidential container implementations lean on that dedicated model because it keeps isolation simple and strong. Workloads that need to scale across multiple GPUs often coordinate through distributed training frameworks that span multiple confidential VMs, each with its own attached GPU.
GPU vendors are beginning to add features specifically targeted at confidential computing, such as encrypting GPU memory or isolating GPU contexts at the hardware level. Over time, that will open the door to something closer to “confidential vGPU” where you can safely time slice a GPU between tenants while still maintaining strict separation. For now, if you care about AI security on untrusted infrastructure, the safe baseline is to map one physical GPU into one confidential VM and treat that pairing as a protected island.
On the Kubernetes side, you still want a device plugin on the node that advertises GPUs to the scheduler. The difference is that on a confidential node, those devices are bound to the appropriate low level driver for passthrough and handed directly to the Kata runtime, which wires them into the guest VM during startup.
Designing your Kubernetes cluster for confidential workloads
You do not turn an entire cluster into a confidential environment overnight. A more realistic pattern is to dedicate a subset of nodes to confidential workloads and let everything else keep running as regular containers.
Start by identifying a pool of worker nodes that will be your secure landing zone. Those nodes need CPUs and firmware that support confidential VMs, BIOS settings turned on for memory encryption, the Kata Containers runtime installed, and the guest images built with the right kernel and userspace. You then label those nodes with something explicit, for example confidential=true and perhaps gpu=true if they have accelerators.
Next, you introduce the components that make this usable from Kubernetes. A node feature discovery tool can scan hardware capabilities and apply labels automatically so you do not have to manage them by hand. The Confidential Containers operator (if you adopt CoCo) can deploy and configure Kata, the guest images, and the guest side attestation agents. A GPU device plugin can advertise GPU resources and ensure they are prepared for passthrough to confidential VMs.
On the control plane side, you configure RuntimeClass objects that represent different options. For example, you might have a standard runtime for regular containers, a kata runtime for non confidential Kata pods, and a kata-cc runtime for confidential ones. You then use pod specs, namespaces, or higher level abstractions like ApplicationSets or Helm values to decide which workloads should use which runtime.
Attestation and key management sit slightly aside from Kubernetes but integrate tightly with it. You need an attestation service that can receive quotes from guest VMs, verify them, and talk to your key manager. Your key manager needs policies that say which workloads are allowed to receive which keys, based on attested measurements and metadata such as expected image digests or cluster identity. You can wire that together through sidecars, init containers, or in some cases a node level agent that injects secrets into the guest over a secure channel once attestation succeeds.
From a developer experience point of view, the aim is to keep things simple. Ideally, the main difference between a normal AI job and a confidential one is a runtimeClassName in the spec and perhaps a different Kubernetes Secret reference for model keys. Everything else should look and feel like a normal deployment. That separation of concern is one of the reasons Kubernetes is a natural fit for confidential computing: operators can deal with the hardware and attestation plumbing, while developers keep their mental model focused on pods and services.
Performance and practical constraints
Running AI workloads inside Kata based confidential containers does introduce overhead, but in many scenarios the cost is smaller than people expect.
There is basic VM overhead. Starting a microVM with its own kernel and user space consumes additional memory and a bit of CPU time compared to a plain container. Kata images are intentionally small, so the absolute numbers are modest. For large AI jobs that already consume significant resources, the relative overhead is usually low.
Memory encryption adds its own cost. Every load and store inside the TEE has to pass through the encryption engine. On older hardware, that can show up as a noticeable slowdown. On current generation processors, the penalty is often in the single digits for many workloads. The exact impact depends heavily on access patterns and the size of the working set. GPU heavy training jobs often remain bottlenecked on the GPU, so the CPU side encryption overhead fades into the noise.
GPU performance is the bright spot. When you attach a GPU through passthrough, the VM sees it almost as if it were bare metal. Benchmarks from multiple vendors have shown near native throughput for training and inference inside confidential VMs compared to traditional VMs or bare metal. The main delay is at startup, when the system has to boot the guest OS, perform attestation, and receive keys.
Density is where you pay a bit more. Because each confidential container is really a VM with its own kernel, you will fit fewer of them on a node compared to plain containers. Memory fragmentation and per guest overhead add up. If your goal is to pack high volume, low sensitivity microservices densely on a node, confidential containers are not the right tool. They are a better fit for fewer, more valuable workloads where the isolation is worth the extra cost.

Tech Stacks Tip Callout:
Save Confidential Mode for Crown Jewels
Not every workload requires the extra complexity of confidential containers. Start with assets that would really hurt if they leaked, such as proprietary model weights, training datasets, or regulated information. Let low risk microservices keep running on standard nodes without the extra overhead.
Operational complexity is also higher. Someone has to understand firmware versions, BIOS flags, guest kernel builds, attestation policy, and integration with your secrets management system. There are operators and reference architectures that make this easier, but it is still more work than running a default Kubernetes cluster. The trade off is that once you invest in that setup, you get a reusable foundation that many teams and applications can build on.
Finally, you should factor in attestation latency. Every time a confidential pod starts, it needs to obtain and verify a quote before keys are released. That can add a few seconds to startup, especially if your attestation service is remote or under load. For long running AI training jobs, that cost is negligible. For extremely latency sensitive, short lived workloads, you will need to decide whether the security benefit justifies the slower cold start.
Where confidential containers add the most value

Once the pieces are in place, confidential containers unlock a set of scenarios that are hard to address with traditional controls.
Protecting intellectual property in AI models is the obvious first one. If your competitive advantage depends on a set of model weights, you probably do not want to leave those sitting in plaintext in someone else’s memory. Hosting the model inside a confidential container means even a cloud administrator cannot dump the process and walk away with your IP. You can take advantage of cloud scale GPUs while still keeping a strong boundary around your models.
Joint analytics and multi party collaboration are another fit. Imagine several organizations that want to train a model on combined data but cannot share raw records with one another. A confidential environment in a neutral cloud can run the training code, with each participant encrypting their contribution and relying on attestation to confirm that only approved code will see it in decrypted form. No single party has unilateral access to the combined dataset, but everyone benefits from the model.
Regulated industries can use confidential containers to extend workloads into locations that would previously have been off limits. A bank might run risk models in a public cloud region while still satisfying requirements that customer data remains protected from infrastructure operators. A healthcare provider might run inference at the edge inside a hospital’s equipment closet and still have a strong story for protecting patient data even if someone walks off with the hardware.
Edge deployments benefit from the physical security angle. If you ship a small footprint AI node to a store, a factory, or a remote site, you have to assume that someone could plug in a keyboard or remove the drives. Encrypting disks is necessary but not sufficient. Confidential VMs keep sensitive model and data content encrypted even while the application is live. If the hardware is stolen or tampered with, what is left behind on memory modules or local storage is not useful.
Finally, cloud providers and internal platform teams can use confidential containers to strengthen tenant isolation. Even if tenants do not explicitly ask for confidential computing, offering confidential node pools or confidential GPU instances can become a differentiator. It signals that workloads are protected not only from other customers but also from overly curious operators inside the provider.
Bringing it all together
Running AI on untrusted infrastructure forces you to answer a hard question. How much do you really trust the platform that is running your most valuable workloads. With Kata Containers and confidential computing, you can shift that question from blind trust to measurable guarantees.
Kata gives each pod its own lightweight virtual machine and moves the isolation boundary down into hardware enforced virtualization. Confidential computing turns that virtual machine into a trusted execution environment that keeps data in use encrypted and integrity protected, even from the host. Together, they form confidential containers that fit naturally into a Kubernetes cluster.
We looked at how confidential containers work end to end, from node requirements, through attestation, to secret delivery. We explored how GPUs fit into the picture today through dedicated passthrough, and where the ecosystem is heading as vendors add features for confidential GPU compute. We also walked through the practical side. There is extra overhead, there is real operational complexity, and today you do give up some flexibility in how you share hardware. Those are not deal breakers, but they are trade offs you should go into with eyes open.
For organizations that care deeply about protecting AI models and data while still taking advantage of cloud and edge compute, confidential containers offer a compelling path forward. They let you treat the underlying infrastructure as untrusted without sacrificing performance or developer ergonomics. Kubernetes remains the control plane that ties it all together.
As this space matures, expect to see more managed offerings, better GPU isolation, and tighter integrations with key management and policy engines. The underlying principle will stay the same. Rather than assuming the platform is safe, you require it to prove its identity and configuration before you hand it anything sensitive. If you are designing an AI platform today and you expect to run on hardware you do not fully control, confidential containers are worth putting on your short list of patterns to understand and pilot.




Comments