How OCSF Became the Common Security Data Language for the AI Era

The security world has spent the last year fixated on models, copilots, and agents. But beneath that visible layer, a quieter change is reshaping how defenses are actually built: vendors are converging on a shared way to describe security data. The Open Cybersecurity Schema Framework (OCSF) has quickly become one of the leading contenders for this role, and it is increasingly embedded in how modern security operations move, normalize, and analyze telemetry—especially as AI infrastructure spreads.

Why security teams needed a shared data language

Security operations centers (SOCs) have long depended on stitching together data from dozens of tools: endpoint protection, identity platforms, cloud providers, SaaS apps, and now AI systems. Each product tends to describe similar concepts—users, devices, sessions, alerts—in its own way, with different field names, nesting structures, and assumptions. The result is a constant tax on security teams.

To do even basic correlation, such as connecting a suspicious login to a later cloud action, teams must normalize data by hand or write custom parsers and translation logic. Consider a simple scenario: an employee logs into a laptop in San Francisco at 10:00 a.m., then, two minutes later, accesses a critical cloud resource from New York. That pattern could signal a leaked credential or session hijacking. But turning this into a reliable detection means reconciling very different log formats and event models across identity, endpoint, and cloud tools.

This normalization work consumes time and attention that could otherwise go into investigations, analytics, and response automation. In a market where every organization is building its own patchwork of tools, a common schema for security events, findings, objects, and context has long felt like an unattainable ideal. OCSF is designed to make that ideal practical.

OCSF in plain language: what it is and how it works

OCSF is an open-source framework for cybersecurity schemas. It is intentionally vendor-neutral and agnostic to storage format, data collection methods, and ETL tooling. Rather than specifying how you must store or transport data, it focuses on what the data means and how it is structured.

In practice, that means application teams, data engineers, and security analysts can agree on a shared set of event types, fields, and relationships. Tools can emit data that conforms to a common model, or translate their own proprietary schemas into OCSF. Downstream, SIEMs, data lakes, and analytics platforms can then consume that data without having to understand every vendor’s quirks.

For SOCs, the impact shows up in day-to-day workflows. Instead of continually rewriting field mappings or designing one-off parsers for each new product, teams can rely on a consistent schema for core security concepts. That opens the door to more portable detections, reusable dashboards, and cross-tool automations. When a new product is added to the stack, the primary integration question increasingly becomes how it maps into OCSF, not how to invent yet another proprietary data model.

For vendors, OCSF provides a clear target for interoperability. Mapping internal schemas to OCSF once can simplify customers’ ingestion paths into their existing SIEM and data lake environments. As more platforms speak this shared language, the value of aligning to it compounds across the ecosystem.

From idea to adoption: two years of rapid growth

The pace of OCSF’s evolution over the last two years has been unusually fast for an industry standard. The project was publicly announced in August 2022 by Amazon Web Services (AWS) and Splunk, drawing on work contributed by Symantec, Broadcom, and other infrastructure-focused companies including Cloudflare, CrowdStrike, IBM, Okta, Palo Alto Networks, Rapid7, Salesforce, Securonix, Sumo Logic, Tanium, Trend Micro, and Zscaler.

Since launch, the community around OCSF has expanded from a small consortium to a broad ecosystem effort. By August 2024, AWS reported that the initiative had grown from 17 companies to more than 200 participating organizations and 800 contributors. When OCSF joined the Linux Foundation in November 2024, contributor counts rose to around 900. That shift brought more formal governance and a clearer path for long-term stewardship.

Alongside community growth, the project has maintained a steady cadence of releases. Over a relatively short period, new versions have added event categories, refined field definitions, and expanded the schema to cover emerging use cases. Rather than remaining a static document, OCSF has behaved like a living standard, shaped by active implementation feedback from vendors and practitioners.

Where OCSF is showing up in products and pipelines

The clearest sign that OCSF is becoming a real standard, rather than a whitepaper, is its presence in operational products and data pipelines across the security and observability ecosystem.

Within the AWS portfolio, OCSF plays multiple roles. AWS Security Lake converts natively supported AWS logs and events into OCSF format and stores them in Parquet, giving customers a normalized, analytics-ready view of security data. AWS AppFabric can output normalized audit data in OCSF, simplifying ingestion into downstream security tools. AWS Security Hub uses OCSF for its findings, and AWS publishes an extension that adds cloud-specific resource details while preserving the base schema.

Beyond AWS, other vendors are aligning their data paths with OCSF. Splunk can translate incoming data into OCSF using its edge processor and ingest processor capabilities, allowing organizations to normalize disparate feeds before they hit their core analytics layer. Cribl supports seamless conversion of streaming data into OCSF and other compatible formats, giving data engineering teams a flexible way to standardize on OCSF in flight.

Security product vendors are also positioning themselves on both sides of the OCSF pipeline. Palo Alto Networks can forward Strata Logging Service data into Amazon Security Lake using OCSF, ensuring that telemetry from its platforms lands in customers’ lakes in a normalized form. CrowdStrike’s Falcon platform both emits and consumes OCSF: Falcon data can be translated into OCSF for ingestion into Security Lake, and Falcon’s Next-Gen SIEM is designed to ingest and parse OCSF-formatted data as an input.

Collectively, these integrations signal that OCSF has moved beyond an abstract standard and into the “plumbing” of how security data flows between tools. For architects building multi-vendor environments, it increasingly represents a default schema to plan around.

AI infrastructure is raising the stakes for shared schemas

As organizations deploy AI systems, the complexity of their security telemetry increases. Large language models (LLMs) are usually just one component in a larger mesh of services: model gateways, agent runtimes, vector stores, tool-calling layers, retrieval systems, and policy engines all work together to answer user requests. Each piece generates its own logs, and those signals often cross vendor boundaries.

For SOC teams, the key security questions are shifting. It is no longer sufficient to know what text an AI assistant produced. Investigations increasingly focus on what the system actually did: which tools it invoked, what data it retrieved, which policies applied, and whether a particular chain of actions led to data exposure or policy violations.

This makes the underlying data model more important. When an AI assistant calls the wrong tool, accesses unexpected resources, or composes a risky sequence of operations, that behavior must be captured in a way that can be correlated across identity, application, and infrastructure logs. AI is also being applied on the analytics side, where models attempt to correlate more data, faster, and at larger scale. Those models perform better and are easier to trust when the underlying event data is structured consistently.

In this environment, a shared security schema like OCSF becomes more valuable. It offers a common vocabulary to represent AI-related security events alongside traditional infrastructure and application telemetry, without forcing every product to invent its own incompatible representation of AI behavior.

How OCSF evolved in 2025 to capture AI behavior

OCSF’s more recent releases reflect this AI-centric shift. Through versions 1.5.0, 1.6.0, and 1.7.0, the framework has added capabilities intended to help security teams reconstruct what happened inside AI-powered workflows.

Consider an internal AI assistant that employees use to search documents and trigger actions like creating tickets or interacting with code repositories. If that assistant starts pulling the wrong documents, invoking tools it should not use, or revealing sensitive data in its responses, security teams need to see more than just the final output. They need to understand the chain of events: how the request was routed, which tools were called, who had access to each connected system, and where the process diverged from expected behavior.

The updates in these OCSF versions are designed to make that reconstruction possible. They support flagging unusual behavior, expressing which identities and systems were involved, and tracing tool calls step by step. In effect, they help preserve the context around AI actions so incident responders can follow the trail from an anomalous response back through the decisions and integrations that produced it.

For security architects and data engineers, this means OCSF can serve as a unifying schema for both traditional security signals and AI-specific telemetry, enabling AI incident investigations to fit more naturally into existing SIEM and data lake workflows.

What’s next: deeper visibility into AI conversations and models

Work on OCSF is continuing in this direction. The planned changes for OCSF 1.8.0, as described by the project’s contributors, aim to expose more detail about how AI interactions unfold.

Imagine an AI customer support bot that suddenly starts giving highly detailed answers, including internal troubleshooting steps that were meant only for staff. With the enhancements under development for version 1.8.0, a security team would be able to see which model handled the exchange, which provider supplied that model, the role of each message in the conversation, and how token usage changed over the interaction.

A sharp increase in prompt or completion tokens could indicate that the system was given an unusually large hidden prompt, pulled in excessive context from a vector database, or generated an abnormally long response—each of which might increase the risk of sensitive data leakage. By providing this level of granularity, OCSF gives investigators practical clues about where an interaction started to deviate, rather than leaving them to infer everything from a single logged response.

For organizations standardizing their security telemetry, these kinds of additions suggest that OCSF is evolving into a schema that can represent not only classic infrastructure events but also the internal mechanics of AI-driven workflows.

What this means for the broader security and data ecosystem

Stepping back, the trajectory of OCSF over the past two years points to a broader shift. What began as a community-driven effort has matured into an operational standard that many security products now rely on day to day. With stronger governance, frequent releases, and broad vendor participation, OCSF has become part of the default tooling conversation for data lakes, ingestion pipelines, SIEM deployments, and partner integrations.

As AI expands the attack surface—through new forms of abuse, scams, and attack paths—security teams need to connect signals from more systems without losing the context that makes those signals meaningful. OCSF’s role is to ensure that as telemetry crosses product and platform boundaries, its structure and semantics remain coherent enough for analytics, detections, and investigations to work reliably.

For security architects, SOC leaders, and data-focused practitioners, the implication is clear: schema design is now a strategic concern. Choosing to align with OCSF is not just a technical formatting decision but a way to future-proof how security data is modeled in an AI-centric environment. As more tools adopt OCSF and the schema continues to evolve to cover AI use cases, the practical benefits of this shared data language are likely to compound across the security stack.

Cary Huang

Hi, I’m Cary Huang — a tech enthusiast based in Canada. I’ve spent years working with complex production systems and open-source software. Through TechBuddies.io, my team and I share practical engineering insights, curate relevant tech news, and recommend useful tools and products to help developers learn and work more effectively.