Redact PII from Traces
This feature is only available for Enterprise users, and is only accessible via the Python SDK.
Some organizations process Personally Identifiable Information (PII) such as names, phone numbers, and email addresses in their Large Language Model (LLM) workflows. Storing this data in Weights & Biases (W&B) Weave poses compliance and security risks.
The Sensitive Data Protection feature allows you to automatically redact Personally Identifiable Information (PII) from a trace before it is sent to Weave servers. This feature integrates Microsoft Presidio into the Weave Python SDK, which means that you can control redaction settings at the SDK level.
The Sensitive Data Protection feature introduces the following functionality to the Python SDK:
- A
redact_pii
setting, which can be toggled on or off in theweave.init
call to enable PII redaction. - Automatic redaction of common entities when
redact_pii = True
. - Customizable redaction fields using the configurable
redact_pii_fields
setting.
Enable PII redaction
To get started with the Sensitive Data Protection feature in Weave, complete the following steps:
-
Install the required dependencies:
pip install presidio-analyzer presidio-anonymizer
-
Modify your
weave.init
call to enable redaction. Whenredact_pii=True
, common entities are redacted by default:import weave
weave.init("my-project", settings={"redact_pii": True}) -
(Optional) Customize redaction fields using the
redact_pii_fields
parameter:weave.init("my-project", settings={"redact_pii": True, "redact_pii_fields"=["CREDIT_CARD", "US_SSN"]})
For a full list of the entities that can be detected and redacted, see PII entities supported by Presidio.
Entities redacted by default
The following entities are automatically redacted when PII redaction is enabled:
CREDIT_CARD
CRYPTO
EMAIL_ADDRESS
ES_NIF
FI_PERSONAL_IDENTITY_CODE
IBAN_CODE
IN_AADHAAR
IN_PAN
IP_ADDRESS
LOCATION
PERSON
PHONE_NUMBER
UK_NHS
UK_NINO
US_BANK_NUMBER
US_DRIVER_LICENSE
US_PASSPORT
US_SSN
Usage information
- This feature is only available in the Python SDK.
- Enabling redaction increases processing time due to the Presidio dependency.