Skip to content
Home » All Posts » OCSF: How a Common Security Data Schema Is Powering the Next Wave of AI-Ready SOCs

OCSF: How a Common Security Data Schema Is Powering the Next Wave of AI-Ready SOCs

As security teams race to operationalize models, copilots, and agentic workflows, a less visible but foundational shift is happening underneath: vendors are converging on a shared way to describe security data. The Open Cybersecurity Schema Framework (OCSF) has quickly become one of the strongest contenders for that role, moving from a community project to real plumbing across major security products.

For security architects, SOC leaders, and data engineers, OCSF is increasingly the layer that determines how fast you can normalize telemetry, correlate signals, and plug AI into your workflows without drowning in custom parsers.

The core problem OCSF is trying to solve

Modern SOCs are stitched together from endpoint, identity, cloud, SaaS, and now AI infrastructure. Each of those layers emits its own flavor of logs and events. Even when tools are describing the same real-world action, they do it with different field names, nesting, and assumptions.

Take a basic investigation scenario: at 10:00 a.m. an employee logs in from San Francisco on a laptop; at 10:02 a.m., a cloud resource is accessed from New York using the same identity. That pattern might point to leaked credentials or session hijacking. But correlating those events in practice is hard when:

  • One product records a source_ip while another uses client_address
  • Device identifiers, user IDs, and session IDs are modeled differently in each system
  • There’s no shared notion of what the “actor,” “resource,” or “action” actually are

The result is a hidden tax on your security program. Teams spend disproportionate effort:

  • Rewriting and renaming fields on ingest
  • Building one-off normalizers for each product integration
  • Maintaining fragile ETL jobs every time a vendor changes their schema

This tax gets worse as you add AI-driven analytics and copilots. Those systems are only as good as the consistency of the data they see. OCSF’s purpose is to lower that tax by providing a common schema for security-relevant data, so vendor feeds and customer pipelines can meet in the middle.

OCSF in plain language: what it is and what it isn’t

OCSF is an open-source framework for cybersecurity schemas. It is explicitly:

  • Vendor neutral: designed by and for a broad ecosystem, not tied to a single platform
  • Format agnostic: OCSF defines structure and semantics, not whether you store it in JSON, Parquet, or anything else
  • Collection-agnostic: it doesn’t dictate how data is collected, only how it’s represented once collected

For application teams and data engineers, it’s essentially a shared data contract for security events, findings, objects, and their context. You map your raw telemetry into that contract once, and then:

  • Analysts work with a consistent language across tools for detection and investigation
  • Data engineers move events through lakes, pipelines, and SIEMs without constant schema rewrites
  • Vendors can interoperate more easily in customer environments

Inside a SOC, that dry-sounding standard translates into fewer bespoke transforms and more reusable content. Correlation rules, detections, and playbooks written over OCSF fields become portable across multiple data sources, instead of tightly coupled to whichever product happened to be deployed first.

How OCSF changes daily life inside a SOC

iczwgqvhsp-image-0

From the operator’s perspective, OCSF shows up in the unglamorous but critical parts of daily work: log onboarding, normalization, and investigation timelines.

Without a shared schema, every new telemetry source adds a new normalization project. With OCSF, the workflow shifts:

  1. Vendors map once: Tool providers map their native event structures to OCSF categories and classes.
  2. Ingest pipelines standardize: Your log collectors and brokers convert incoming data into OCSF as it lands in a data lake or SIEM.
  3. Analysts operate on one model: Detection content and queries target OCSF concepts (actor, action, resource, outcome) instead of vendor-specific field names.

Returning to the login example, the SOC doesn’t need to remember whether a specific product uses src or source_ip. Both get normalized to the same OCSF field. The correlation logic becomes about semantics (“same user, impossible travel pattern across locations”) rather than about decoding a dozen vendor schemas.

Over time, this shifts effort away from maintaining field mappings and toward higher-value tasks: building analytics, tuning detections, and developing response workflows that can work across your stack instead of just within one tool.

From niche effort to de facto standard: a fast two-year ramp

OCSF’s visible acceleration has largely happened in the last two years. The framework was announced in August 2022 by Amazon Web Services (AWS) and Splunk, building on work contributed by Symantec, Broadcom, and a set of major infrastructure players including Cloudflare, CrowdStrike, IBM, Okta, Palo Alto Networks, Rapid7, Salesforce, Securonix, Sumo Logic, Tanium, Trend Micro, and Zscaler.

Since then, the community has grown rapidly. According to AWS, by August 2024 OCSF had expanded from an initial 17-company effort into a community with more than 200 participating organizations and 800 contributors. When OCSF joined the Linux Foundation in November 2024, contributor counts grew further, reaching around 900.

That growth has been matched by a steady cadence of releases over the past two years. The effect for practitioners is that OCSF has evolved from a promising idea into a framework with active governance, frequent updates, and wide vendor input—traits that matter if you’re planning to build long-lived pipelines and detection content on top of it.

Where OCSF already shows up in your tools

xireaixlis-image-1

OCSF is no longer just a specification on paper; it’s showing up across observability and security products in ways that directly affect how you design architectures.

On the AWS side:

  • AWS Security Lake converts natively supported AWS logs and events into OCSF and stores them in Parquet format. That means data landing in the lake arrives already normalized to the schema.
  • AWS AppFabric can output standardized audit data in OCSF, streamlining normalization from supported SaaS applications.
  • AWS Security Hub uses OCSF for its findings and publishes an extension to capture cloud-specific resource details.

Beyond AWS:

  • Splunk can translate incoming data into OCSF using its edge processor and ingest processor, letting you normalize events closer to the source.
  • Cribl supports converting streaming data into OCSF and compatible formats, making it easier to standardize logs in motion.
  • Palo Alto Networks can forward Strata Logging Service data into Amazon Security Lake in OCSF form.
  • CrowdStrike participates on both ends of the pipeline: Falcon data can be translated into OCSF for Security Lake, and Falcon Next-Gen SIEM is positioned to ingest and parse OCSF-formatted events.

This level of adoption is rare for a security data standard. OCSF has effectively crossed the chasm from an abstract specification to the default operational schema in many pipelines, making it a practical design consideration rather than a future-looking bet.

Why AI is making a shared schema non-optional

AI is reshaping the attack surface and the telemetry landscape at the same time. When enterprises deploy AI infrastructure, large language models (LLMs) typically sit at the center, surrounded by:

  • Model gateways and API layers
  • Agent runtimes and orchestration frameworks
  • Vector stores and retrieval systems
  • Tooling integrations (ticketing, code repos, knowledge bases)
  • Policy and guardrail engines

Each of these components generates new kinds of security-relevant telemetry, often crossing product boundaries. For SOCs, the critical question shifts from “What did the model say?” to “What did the AI system actually do, across all its tools and data sources, and did that create a security incident?”

Examples include:

  • An AI assistant calling an unintended tool and triggering a sensitive workflow
  • An agent retrieving the wrong dataset from a vector store and leaking sensitive information
  • Policy engines being bypassed or misapplied due to complex chain-of-thought or tool-use patterns

Investigating those scenarios puts significant pressure on the underlying data model. You need to track not just prompts and responses, but the full chain of actions and decisions across systems. In that environment, a shared schema like OCSF becomes more valuable, especially when AI is also being used for analytics and correlation on the backend.

Inside the 2025 OCSF releases: making AI behavior observable

wjnrtkysdt-image-2

OCSF’s recent evolution has explicitly targeted these AI-centric challenges. In 2025, versions 1.5.0, 1.6.0, and 1.7.0 introduced updates aimed at making AI assistant behavior more observable and investigable.

Consider a company that uses an internal AI assistant to help employees look up documents and trigger tools like ticketing systems or code repositories. One day, the assistant starts:

  • Pulling the wrong files
  • Calling tools it should not access
  • Exposing sensitive details in its responses

Changes across OCSF 1.5.0 through 1.7.0 help security teams:

  • Flag unusual behaviors related to how the assistant interacts with systems
  • See who had access to the connected systems involved in the interaction
  • Trace tool calls step by step, reconstructing the chain of actions that led to the problem

Instead of just seeing the final text answer produced by the AI, investigators can follow the operational narrative: which tools were invoked, what data was accessed, and how that sequence deviated from normal or expected behavior. That level of traceability is essential when AI-driven incidents don’t neatly map to traditional endpoint or network alerts.

What’s next: deeper visibility into AI conversations and models

The upcoming OCSF 1.8.0 release is being developed with even richer AI observability in mind. Imagine a customer support bot that suddenly starts providing long, detailed answers including internal troubleshooting notes meant only for staff.

With the changes envisioned for OCSF 1.8.0, a security or incident response team would be able to see, for that interaction:

  • Which model handled the exchange
  • Which provider supplied the model
  • What role each message in the conversation played
  • How token counts evolved across the dialogue

A sudden spike in prompt or completion tokens might indicate that:

  • The bot received an unusually large hidden prompt
  • Too much background data was pulled from a vector database
  • An excessively long response increased the chance of sensitive data leaking

OCSF doesn’t solve those issues by itself, but it gives investigators practical clues on where to look by encoding these details in a consistent way. Instead of being limited to the final answer, they can inspect the structural attributes of the interaction and pinpoint where it went off course.

What this means for your AI-ready SOC architecture

For the broader market, the key story is that OCSF has rapidly transitioned from a collaborative idea to a real, widely used standard. Over the past two years, it has gained:

  • Stronger community governance
  • Frequent, AI-aware releases
  • Concrete support across data lakes, ingest pipelines, SIEM workflows, and partner ecosystems

In a world where AI is expanding the threat landscape through new scams, abuse patterns, and attack paths, the ability to connect data from many systems without losing context is becoming foundational. OCSF is increasingly the schema layer that allows that connection to happen in a repeatable way.

For security architects and SOC leaders designing AI-driven workflows, this suggests several practical takeaways:

  • Treat OCSF as a first-class design choice when you evaluate tools, build pipelines, or plan data lake architectures.
  • Prioritize vendors and intermediaries (collectors, stream processors, SIEMs) that can emit or ingest OCSF natively.
  • Align detection content and playbooks with OCSF concepts so they can travel with you as tools change.

As AI systems become both powerful assets and novel attack surfaces, the SOCs that can most quickly reconstruct “what actually happened” across heterogeneous systems will have a decisive advantage. OCSF doesn’t remove that challenge, but it gives teams a shared data language to confront it.

Join the conversation

Your email address will not be published. Required fields are marked *