I want to walk through something specific, because when I talk to security teams about AI data leaks, they often think of vague scenarios. The reality is much more concrete than that.
Right now, somewhere in your organization, someone is pasting a database connection string into ChatGPT. It looks like this: they have a production issue, they are under time pressure, and they need to write a query. So they grab the connection string from their config file and drop it into the chat along with their question. That string contains the host, port, username, and password for a production database.
What Actually Gets Pasted
We analyzed prompt patterns across organizations using AI monitoring tools, and the categories of sensitive data that show up most frequently are remarkably consistent:
API keys and tokens. Developers paste API keys when asking for help with integrations. AWS access keys, Stripe secret keys, OAuth tokens, JWT secrets. They paste the key, ask "why is this returning a 403," and move on. The key is now in a third-party system.
Database credentials. Connection strings, SQL Server passwords, Redis auth tokens. Developers paste these when debugging queries or asking for schema advice. Some even paste entire .env files.
Source code with embedded secrets. This is the most common pattern. Someone pastes a code block for review or debugging help, and that code block contains hardcoded credentials, internal URLs, or configuration values that reveal infrastructure details.
Customer data. Support teams paste customer conversations. Sales teams paste CRM records. HR pastes employee records. In every case, the person is trying to work faster, and the AI tool is genuinely helpful. But the data leaving the organization includes names, emails, phone numbers, account numbers, and sometimes financial details.
Internal documents. Strategy decks, board presentations, legal memos, M&A analysis. Employees paste these into AI tools for summarization, editing, or analysis. The content is often highly sensitive and would be classified as confidential or restricted under any reasonable data classification scheme.
Why This Happens
It is tempting to blame employees, but that misses the point. People paste sensitive data into AI tools because AI tools are extraordinarily useful. A developer who gets a working answer in 30 seconds is not going to spend 20 minutes sanitizing their input first. The friction of doing the right thing is too high, and the perceived risk is too low.
Most employees do not think of ChatGPT as a "third-party system" the way they think of email or file sharing. It feels like a private conversation. The interface is designed to feel that way. But every prompt is transmitted to and processed by external infrastructure, and depending on the provider and plan, it may be used for model training.
The Scale of the Problem
Samsung banned ChatGPT after engineers leaked source code through it. In that case, at least three separate incidents occurred in a short period: semiconductor equipment data, meeting notes, and source code for a proprietary program. That was one company willing to talk about it publicly. For every incident that makes the news, there are thousands that go undetected.
A 2024 study found that 11% of data pasted into ChatGPT by employees was confidential. Not sensitive. Confidential. And the employees doing it were not malicious. They were trying to get their jobs done.
Detection and Prevention
Traditional DLP tools were built for email attachments and file transfers. They are not designed to inspect text typed or pasted into a browser-based web application. To catch credentials and PII before they reach an AI provider, you need detection at the browser layer.
Effective detection looks for specific patterns in real time:
- Strings matching API key formats (AWS keys start with AKIA, Stripe keys start with sk_live)
- Connection strings with embedded credentials
- Social Security numbers, credit card numbers, and other structured PII
- Email addresses and phone numbers in bulk
- Code blocks containing hardcoded secrets
InvestigAItor's PII and credential detection runs in the browser before the prompt is submitted. When sensitive data is detected, the organization's policy determines the response: log the event, warn the user, require manager approval, or block the submission entirely. The goal is not to stop people from using AI. It is to stop sensitive data from leaving the organization through AI channels.
A Practical Starting Point
If you are not monitoring AI prompts today, start with visibility. Deploy monitoring in passive mode and review what comes back after a week. You will almost certainly find credentials, customer data, and internal documents flowing through AI tools. That data gives you the evidence to build policy and the urgency to enforce it.