MSSP, Storage, AI/ML, API security, Compliance Management

Securiti, Databricks Team Up to Protect Proprietary AI Training

Running Computer data programming. Coding script text on screen. Notebook closeup

Securiti and Databricks are partnering to tackle the complex challenge of ensuring the vast amounts of corporate data needed to build enterprise AI applications and agents are secure and comply with privacy regulations.

Enterprise proprietary data is crucial for organizations that need AI tools that address their specific needs, but putting such data into AI systems fuels worries about it being exposed.

“What we’re consistently hearing from our enterprise customers is that their greatest challenge lies in safely and reliably using data from diverse systems while ensuring proper controls and governance throughout the AI pipeline,” Securiti CEO Rehan Jalil told MSSP Alert. “This is particularly critical since the majority of an organization's data exists in unstructured form, making proper governance and control of these assets essential.”

Securiti’s alliance with Databricks will give organizations, MSSPs, and other channel partners the tools to make sure that such unstructured data is protected and in line with government orders it’s used when developing AI systems.

Integrating Security and Data

With the partnership announced this week, the San Jose, California-based company is integrating Databricks’ Mosaic AI and Delta tables into its Gencore AI offering. Securiti launched almost seven years ago with its Data + AI Command Center platform for protecting corporate information and storing it per compliance requirements.

In October 2024, Securiti, whose Unified Partner Program includes MSSPs along with system integrators, resellers, and cloud service providers, unveiled Gencore AI, software that provides a safe way for enterprise AI systems to securely connect to myriad data systems and pull in huge amounts of structured and unstructured corporate data into their generative AI systems.

The vendor’s partnership with Databricks is another step in the process. Databricks, based in San Francisco, offers a data lake that can store all types of data – structured, unstructured, and semi-structured – and comes with an optimized storage layer called Delta Lake. Delta Lake is the default format for all operations on Databricks, and all tables on Databricks are Delta tables, unless noted otherwise.

Databrick’s Mosaic AI, a collection of machine learning tools acquired when it bought MosaicML for $1.3 billion in 2023, makes those Delta tables accessible to organizations’ AI models.

A 'Groundbreaking Approach'

“The combination of Databricks and Securiti introduces a groundbreaking approach to safe enterprise AI development,” Jalil said. “By seamlessly connecting to diverse unstructured and structured data sources, Securiti’s Gencore AI leverages Securiti's Data Command Graph to automatically select relevant datasets based on business context and compliance requirements. This automated data pipeline ensures not only data freshness and topical relevance but also implements real-time sanitization, including redaction and anonymization.”

Meshing the technologies provides enterprise-level AI security and governance that falls in line with OWASP’s security large language framework (LLM) for AI applications, the CEO said, adding that “the solution automatically sanitizes data during ingestion, maintains entitlements from source systems, and enforces these entitlements at the AI consumption layer. Additionally, it safeguards embeddings in vector databases while monitoring and controlling prompts and responses.”

Security and Compliance are Necessities

Security and compliance are key considerations for enterprises that want to use proprietary data to train their AI models, according to Outshift, Cisco’s internal incubation unit for such emerging technologies as AI and quantum computing.

In a blog post last fall, Rose Merced, content strategy leader for Outshift by Cisco, noted that while base LLMs that are trained primarily on publicly available data continue to expand their capabilities, enterprise generative AI applications need the accuracy and relevancy that access to in-house corporate data provides.

“LLMs may seem like magic, but in the end, they can only be as good as the data on which they were trained,” Merced wrote. “Adding proprietary data to a model makes it aware of business-specific needs and tasks. When an LLM is fine-tuned on domain-specific data, it is much more effective at providing accurate and relevant answers to domain-specific queries.”

However, organizations need to ensure data security and regulatory compliance when including proprietary information when training AI models. That includes using the same policies and security measures for the data that they use for AI systems.

“Your data is not externally exposed if you run your entire fine-tuning pipeline internally, making this a more protective option,” she wrote. “However, if you use a third-party service like OpenAI, then you need to make sure you trust them with your in-house data.”

Merced added that both data privacy and security compliance must adhere to all rules, regulations, and policies, adding that providing clear documentation and open communication, as well as auditing and monitoring at each step, is critical.

An In-Depth Guide to AI

Get essential knowledge and practical strategies to use AI to better your security program.

You can skip this ad in 5 seconds

Cookies

This website uses cookies to improve your experience, provide social media features and deliver advertising offers that are relevant to you.

If you continue without changing your settings, you consent to our use of cookies in accordance with our privacy policy. You may disable cookies.