How I Built a PII Tokenization Middleware to Keep Sensitive Data Out of LLM APIs
The Problem I Kept Ignoring Every time we sent a customer transcript to an LLM API, we were sending real data — credit card numbers, home addresses, full names, national IDs — in plaintext to a thi...

Source: DEV Community
The Problem I Kept Ignoring Every time we sent a customer transcript to an LLM API, we were sending real data — credit card numbers, home addresses, full names, national IDs — in plaintext to a third-party server. Most teams I've talked to handle this in one of two ways: Ignore it and hope the provider's data processing agreement covers them Prompt engineer around it — "don't repeat personal information in your response" — which does nothing about what's already been transmitted Neither is acceptable in a production system handling real user data. So I built llm-hasher — a PII tokenization middleware that sits between your application and any LLM API. The Core Idea The LLM doesn't need to see the actual credit card number to summarize a support transcript. It just needs to know a credit card number was mentioned. So instead of: "Hi, my card is 4111-1111-1111-1111 and email is [email protected]" The LLM receives: "Hi, my card is CREDIT_CARD_john12_4f8a2b and email is EMAIL_john12_9c3d1a"