Why human risk data needs a data lakehouse
In modern human risk management, delivering the right insights and recommendations requires assembling the right data and making it available to the right decision-makers, right away. Many platforms limit what data can be captured or constrain it as it’s being captured, making it hard for security practitioners to do high-value work like build personalized risk profiles, view behavioral timelines, detect changes over time, and correlate cross-platform signals down to the individual employee.
As we were building Fable, we knew we needed to integrate thousands of data points from hundreds of sources, and then transform that data into something accurate and meaningful for our customers, while also preserving the captured data in its original format. We modeled our data lakehouse architecture after the Medallion Data Foundation, an industry standard in modern data engineering.
These insights—and the recommendations they enable—are possible because we’ve organized our data lakehouse into a three-layer pipeline: bronze, silver, and gold. Each layer plays a distinct role in turning messy inputs into precise findings on which you can take action.
Our approach: A bronze, silver, and gold data lakehouse pipeline
The bronze layer: capture everything; lose nothing
The bronze layer is our raw landing zone for exact API responses. Here, we ingest data as nested JSON blobs from across the human attack surface: security event logs, phishing simulations, email gateways, endpoint detections, policy compliance records, HR data, workspace events, and more.
The key at this stage is fidelity: we preserve the original data exactly as we receive it, schema quirks and all. This “store first, shape later” approach means we never lose potentially valuable context, even if we don’t yet know how we’ll use it.
The silver layer: make it consistent and connected
The silver layer is where raw chaos becomes usable. We flatten data (with no data loss), normalize formats, and correct quality issues. We also join data points from disparate systems, e.g., an endpoint alert to the employee who uses the device or a phishing click with an employee’s role, tenure, and past phishing simulation performance. We also remove obvious noise so downstream models and analytics don’t get tripped up by irrelevant events. The result is a unified, queryable view of human risk events across the enterprise. This layer is the difference between “we have the data” and “we can ask meaningful questions.”
The gold layer: deliver insights that drive action
The gold layer is where we have human risk data. Here, we apply advanced processing, analytics, and machine learning to identify patterns, score risk, and trigger interventions. A phishing click becomes a risk score adjustment; a policy violation becomes a two-way chat; anomalous behavior across multiple systems or from a foreign country flags a just-in-time security briefing. The gold layer is tightly coupled to our platform’s agentic intervention capability, ensuring that insights don’t just sit in dashboards; they actively shape behavior.
By combining these three layers, we get a complete picture of human risk, like how repeated phishing missteps plus excessive access can reveal an employee’s rising risk.
Why this architecture matters
This bronze-silver-gold-layered approach matters because human risk data is messy, siloed, and often context-dependent. Without the bronze layer, you lose historical detail that could be vital in an investigation. Without silver, you can’t reliably connect behaviors across systems and people. And without gold, you can’t put insights into action in a way that changes outcomes. Together, these layers ensure that every security-relevant human action, whether a click, a login, or a policy acknowledgment, is part of a coherent, actionable risk picture.
What human risk use cases are possible
Because our Medallion-based pipeline keeps the data clean, connected, and context-rich, it enables capabilities that would otherwise be impossible. Some examples of human risk use cases are:
- Behavioral trend analysis: Identify departments where phishing susceptibility is increasing month over month.
- Precision interventions: Trigger a targeted briefing for an employee who failed a simulated phishing test and recently had a risky browser download.
- Risk-informed policy changes: Highlight patterns where security policies are routinely bypassed, so leaders can address root causes rather than just symptoms.
In human risk management, speed, accuracy, and context aren’t nice-to-haves; they’re the difference between stopping a breach and cleaning up after one. Our data lakehouse architecture ensures we always have the intelligence we need, when we need it, to keep our customers secure.