OpenClaw Security Platform: Detection, Blocking & How It Works

OpenClaw is a continuously running AI agent orchestration application that has access to your shell, your file system, your messaging apps, and your APIs. It ships with none of the controls you would expect from software that has that kind of reach.

Zenity, a company focused on AI security, released an open-source framework this week that fills the gap. It is called the OpenClaw Security Platform. It adds detection and blocking capabilities directly inside OpenClaw agent workflows. The framework is available now and requires no fork of OpenClaw.

The Problem It Solves

OpenClaw's architecture is intentionally open. The framework lets developers wire AI agents to essentially any external system through a plugin model. That openness is what makes it useful and what makes it dangerous.

An agent operating through OpenClaw can execute shell commands, read and write files, send messages, and call external APIs, all in sequence, autonomously, using the user's credentials. Over 21,000 OpenClaw instances were exposed with no authentication in its early weeks. The lack of a native security layer compounds that problem: even a properly authenticated instance has no built-in mechanism to inspect what its agents are actually doing at runtime.

The OpenClaw Security Platform is designed to be dropped into an existing OpenClaw setup without migration. It operates as either a lightweight plugin or a full reverse proxy and evaluates agent activity at three checkpoints in the execution flow.

Three Checkpoints

The framework intercepts events at points where something meaningful is about to happen or has just happened.

message.before: Inbound prompts and messages are inspected before they enter the agent's context. This is the right place to detect prompt injection attempts, filter personally identifiable information (PII), and screen for inputs that violate policy before the agent begins reasoning about them.

tool.before: Tool calls are evaluated before execution. An agent about to run a shell command, make an API request, or write a file hits this checkpoint first. This is where dangerous commands can be blocked before any damage occurs.

tool.after: Tool output is evaluated after execution. Even when a tool call looks safe, the response can expose secrets or sensitive data. This checkpoint catches that.

The three-stage model mirrors how security controls work in other execution environments: inspect input, inspect the action, inspect the result. The coverage is not new thinking, but it is good to see it applied cleanly to an AI agent framework.

Six Evaluator Tiers

The framework supports six types of evaluators, each suited to a different class of detection problem. They run in a fixed sequence ordered by performance cost, and they short-circuit on a block verdict.

regex (~1 µs): Pattern matching for secrets, credentials, PII, and dangerous command strings. Ships with pre-built rules for AWS keys, GitHub tokens, and common shell commands that should not run autonomously. Fastest tier by a wide margin.

Sigma (~1 ms): Standard YAML threat detection rules, the same format used across the security industry. Sigma rules from the broader ecosystem can be mapped to OpenClaw events without modification. Teams with existing security operations centre (SOC) tooling can reuse what they already have.

CEL (~1 ms): Common Expression Language for conditional policy evaluation. Provides full access to every field in an event with standard boolean logic. Useful for policies that regex cannot express cleanly.

SQL (~10 ms): In-memory SQLite for temporal queries. This is the most architecturally interesting evaluator tier. It enables rate limiting, burst detection, and session-level anomaly scoring, which requires looking across multiple events rather than evaluating each one in isolation.

ML (~50 ms): Local ONNX (Open Neural Network Exchange) model inference for prompt injection detection, toxicity classification, and custom classifiers. Models run locally; no data leaves the machine. The ONNX runtime makes this tier accessible without requiring a hosted inference service.

LLM (~500 ms): Semantic evaluation using an LLM as a judge. Policy-driven, structured verdicts. Useful for decisions that require understanding intent rather than matching patterns. The latency cost means this should sit at the end of the chain, invoked only when the faster tiers have not already resolved the event.

The performance chain is a thoughtful design choice. The cheapest evaluators always run first and, on a block verdict, the remaining evaluators are skipped entirely. Regex catching a secret in a prompt means ML and LLM never wake up. This makes the overhead of running security checks predictable and manageable.

Two Deployment Modes

The framework offers two ways to sit inside an OpenClaw deployment, and the difference matters.

Shim plugin: Installs as a native OpenClaw plugin. It registers hooks and forwards events to a local Python evaluation server over HTTP. The shim can block at tool.before but only detect and alert at message.before and tool.after. This is a limitation of how OpenClaw's plugin hooks work: message.before and tool.after are fire-and-forget in the plugin model, so those stages cannot prevent an event, only observe it.

API proxy: Sits between OpenClaw and the Anthropic API as a reverse proxy on port 9920. Because it intercepts every request and response at the transport layer, it can block, redact, or detect at all three stages, including rewriting streamed responses. No plugin installation is required. OpenClaw does not need to be reconfigured beyond pointing it at the proxy instead of the Anthropic API directly.

The proxy mode is the more capable option. The shim mode is the lower friction option. The choice depends on whether tool level blocking alone is sufficient or whether full coverage is required.

The Bring-Your-Own-Security Philosophy

One design decision worth noting: the framework does not ship with opinions about what constitutes dangerous behavior. It ships with the mechanism to evaluate events and the infrastructure to act on those evaluations. What counts as a block, a redact, or a detect is up to the team deploying it.

This is a deliberate choice, and a reasonable one. What is dangerous in an OpenClaw deployment varies by context. A developer environment running experimental workflows has different risk tolerances than a production system handling customer data. A framework that encodes a fixed threat model would be wrong for most of the environments it is deployed in.

The trade-off is that the framework requires security judgment to be useful. The pre-built regex rules lower the barrier to getting started, but teams that want meaningful coverage will need to write evaluators that reflect their actual threat model.

What to Make of It

The OpenClaw Security Platform addresses a real gap. Agents that can execute shell commands and call external APIs without any inspection layer are a meaningful operational risk, and that risk grows as OpenClaw deployments move from developer experiments to production workloads.

The architecture is sound: three checkpoints, six evaluator tiers ordered by cost, two deployment modes for different integration constraints, a real-time dashboard, and a clean HTTP API for the evaluation server. For teams building on OpenClaw who have been waiting for a security story, this is a practical starting point.

Whether it is sufficient depends on what you are building. The shim plugin's inability to block at message.before and tool.after is a limitation that matters in high-risk deployments. The API proxy solves that, but adds a network hop and a new process to maintain. Neither mode offers persistence beyond in-memory state, which limits what the SQL evaluator can do across sessions.

These are not criticisms of the framework so much as an honest accounting of what it covers and what it does not. It is open source, available now, and fills a gap that OpenClaw itself has not addressed.

The project is at github.com/zenitysec/openclaw-security-platform.

OpenClaw Gets a Security Layer. Here Is What It Does.