How to ensure data privacy in a ChatGPT world

Organizations already struggle to keep employees from risky behaviors that can result in a data breach. Now, generative AI presents a whole new threat – employees who accidentally input sensitive corporate or consumer data into ChatGPT.

As more organizations adopt generative AI into the workplace, 15% of employees regularly post data into the tool, according to research from LayerX released last year. Of those that share information in ChatGPT, 6% admit they’ve shared sensitive data.

Now security teams have a new worry: how to keep employees from inputting personally identifiable information and proprietary corporate information into generative AI tools.

Sharing personal data puts the organization at risk of violating any number of data compliance laws. For organizations that want to add generative AI to their tool box, they need to build security protocols designed to prevent data leaks of sensitive information.

Putting guardrails in place

The truth about AI, particularly generative AI, is that while it does present risk for companies, it also offers a lot of benefits. It’s up to the organization to recognize how the good part of AI can turn into a risk.

There’s a need to put guardrails in place that allow organizations to do business safely while embracing AI, said Max Shier, VP and CISO with Optiv.

“Everybody is trying to find that balance between enablement and risk mitigation, especially as it relates to privacy laws and protecting company confidential information,” said Shier.

Generative AI used within any organization needs policies and controls designed to protect data.

The best case scenario is that a company does not onboard ChatGPT and similar tools unless the company already has a mature security program with data loss prevention tools in place and AI-specific user awareness training, Shier said.

CISOs and CIOs will need to balance the need to restrict sensitive data from generative AI tools with the need for businesses to use these tools to improve processes and increase productivity.

They have to do this all while staying compliant with the alphabet soup of rules and regulations.

The “easy” answer is to ensure that sensitive data is not finding its way into LLMs – and that doesn’t mean just training data, said John Allen, VP of cyber risk and compliance at Darktrace, in an email interview.

“Many offerings from popular LLMs specifically state that any data you provide via prompts and/or feedback will be used to tune and improve their models," said Allen. "However, enforcing this limitation on sensitive data is easier said than done.”

Protecting data

There are two areas of emphasis when looking at ensuring data privacy in generative AI use, according to Craig Jones, VP of security operations at Ontinue, in an email interview.

Compliance maintenance:

Organizations need to rigorously assess and control how LLMs handle data, ensuring alignment with General Data Protection Regulation, the federal law restricting release of medical information, and the California Consumer Privacy Act.

This involves employing strong encryption, consent mechanisms and data anonymization techniques, alongside regular audits and updates to data handling practices.

Securing sensitive data:

Ensuring the security of sensitive data involves employing a multilayered security approach, including encryption at rest and in transit, strict access controls and continuous monitoring for anomalies.

In case of a breach, rapid response and remediation measures need to be in place, along with clear communication to affected stakeholders following the legal and regulatory requirements.

The lessons learned from such incidents should be integrated into improving the data security framework to better address future scenarios.

Safeguards ahead

Generative AI and other security tools are adding subscription levels with enhanced privacy protections or are building out APIs that will restrict sensitive data from leaving the company’s system. The data is not used to develop other AI models.

“In fact, many vendors will also enter into data processing agreements and business associate agreements in order to meet specific compliance requirements for handling sensitive data,” said Allen.

In addition to the generative AI usage policies designed to protect sensitive data, AI companies are also stepping up to better protect sensitive data, adding security controls like encryption and obtaining security certifications such as SOC2.

But this is still a new territory, and security teams are trying to learn what happens when sensitive data finds its way into a model, how to find that data and how to delete it — especially for PII under strict data compliance regulations.

“The use of generative AI tools is ultimately still in its infancy and there are still many questions that need to be addressed to help ensure data privacy is respected and organizations can remain compliant,” said Allen.