- P-Filtering: Redacts based on pre-trained Unigram Language Model anticipating how frequent certain words are imputed
- Named Entity Recognition (NER) : Provides high accuracy with regards to names, cities, countries and other geographical indicator
- Regex based redaction: Our redaction engine identifies and redacts any text which matches a preset regular expression. Our default set of regex rules includes the patterns commonly used for credit cards, SSNs, emails, etc. to PCI information. We can further customize our redaction to meet the most stringent data obfuscation needs our customers might have
Redaction Rules
The following table outlines available redaction rules implemented by ASAPP today and denotes which rules are implemented by default. If needed, rules can be specifically applied to either participant in a conversation (agent or customer) - by default, they are applied to utterances from all participants.Regex-based Redaction
These are the redaction rules implemented by py-redaction. They are written in python and implement each ruled as described in the last column:Rule | Target | Mechanics | Description |
---|---|---|---|
digits 1-5 | digital | regex | Redacts sequences of N digits |
digits >=6 | digital | regex | Redacts sequences of N digits accounting for common separators |
digits7plus | digital/voice | regex | Redacts sequences of 7+ digits accounting for common separators |
digitsN-dates | digital | regex | Redacts sequences of exactly N digits accounting for common separators, except if they match a valid date |
digitsTranscribed | digital | regex | Redacts any sequence of numbers accounting for regular and spoken-form numbers |
digits9ssn | voice | regex | Redacts 9 digit SSN accounting for separators used in SSN |
creditCard | digital | Luhn check + regex | Runs a Luhn check (all valid CCNs pass it) on number sequences and checks that the number pattern belongs to a known C issuer. |
truncateDigitsN | voice | regex | Redacts sequences of N numbers accounting for spoken-form numbers |
date | digital | regex | Redacts various date formats |
profanity | digital/voice | regex + known terms | Redacts words and phrases present in a list of known bad words |
voiceCCTruncate | voice | regex | Redacts any sequence of numbers with a length within the length of a valid credit card number (11-17) accounting for spoken-form numbers |
1234 56 789
123-45-6789
1.234-56 789
123456789
Each of these strings matches the pattern of a 9-Digit Numeric, so each would be redacted per the standard policy outlined above.
N-Digit Numeric rules detect numbers that are spelled (e.g. five rather than 5) in addition to detecting numerical digits. This capability is designed for detecting transcribed numbers in voice calls that have not been automatically converted to a numerical digit format by the ASR.
PII Specific Rules
Rule | Target | Mechanics | Description |
---|---|---|---|
nerTags | digita/boice | NER | Uses Machine learning algorithms to detect NER entities like NAME, ADDRESS |
digital | regex | Redacts any well-formed email address (abc@gmail.com) | |
pfilterEmailVoice | voice | regex + pfilter | Applies pfilter to the sentence which matches email like format |
phoneNumber | digital | regex | Redacts sequences of digits that could be phone numbers accounting for common phone number formats (parenthesis, dashes, etc.) |
phoneNumberVoice | voice | regex | Redacts word containing all digits of text that contains phone number keywords (e.g., “phone number”, “callback number”) |
passcode | digital/voice | regex | Redact words containing all digits from a passcode keyword (e.g. “code”, “pin”, etc) |
pfilterPassword | digital/voice | regex + pfilter | Redacts every word surrounding password keywords. Redacts every combination of:
|
spelledOut | voice | regex | Redacts spelled out characters (>1) which can be some form of PII. |
spelledOutSingle | voice | regex | Redacts single characters if there is only one character present in a sentence. Most likely customer dictating their PII. |
ssnVoice | voice | regex | Redacts digits surrounding SSN keywords. |
crediCardVoice | voice | regex | Redacts all digits surrounding card keywords. |
dob | digital/voice | regex | Redact valid dates surrounding keywords like “birth”, “date of birth”, “born” |
Immediate and Delayed Redaction
For Messaging and Voice Applications only, ASAPP offers a delayed redaction capability in addition to immediate redaction. Immediate redaction can happen either on the front-end, if the channel is leveraging an ASAPP SDK, or on the back-end through the ASAPP redaction service. If a channel does not leverage the ASAPP SDK, the front-end will still show the raw unredacted values to the customer; however, all data processed by the ASAPP back-end will still be redacted. For delayed redaction, ASAPP removes sensitive data after a fixed time period (1 hour by default) so that data can be temporarily displayed to the agent. ASAPP stores this temporarily viewable data in a single cache. Each cache entry has its own Time-To-Live (TTL) starting from when it is first stored. At expiration time, ASAPP automatically removes the data from the cache and it is no longer available.Context-Aware Redaction
For AutoTranscribe and Voice Application products, ASAPP employs redaction triggered by specific words in transcribed utterances. After a trigger word is detected, redaction is employed for a pre-configured time window. This redaction capability allows organizations to target certain contexts rather than applying a rule for the entirety of a conversation, which would otherwise lead to over-redaction of useful information. By default, ASAPP employs two context-aware rules to all conversation transcripts created with AutoTranscribe and the Voice Application:- Credit Card Numbers: After words related to credit cards and payments are detected, all numerical digits are redacted for a 180-second window.
- Social Security Numbers: After words related to social security numbers are detected, all numerical digits are redacted for a 180-second window.
How ASAPP Redacts the Non-Numeric Strings
ASAPP redacts by:Dates
Date formats introduce additional difficulties when it comes to redaction, but these can still fundamentally still be treated with regular expressions over several different possible formats. ASAPP typically handles the following formats: Numeric only date formats:- European variants are also handled, e.g., 15/12/2018