Skip to main content

Redacting PII

important

This feature is only accessible via the Python SDK.

Some organizations process Personally Identifiable Information (PII) such as names, phone numbers, and email addresses in their Large Language Model (LLM) workflows. Storing this data in Weights & Biases (W&B) Weave poses compliance and security risks.

The Sensitive Data Protection feature allows you to automatically redact Personally Identifiable Information (PII) from a trace before it is sent to Weave servers. This feature integrates Microsoft Presidio into the Weave Python SDK, which means that you can control redaction settings at the SDK level.

The Sensitive Data Protection feature introduces the following functionality to the Python SDK:

  • A redact_pii setting, which can be toggled on or off in the weave.init call to enable PII redaction.
  • Automatic redaction of common entities when redact_pii = True.
  • Customizable redaction fields using the configurable redact_pii_fields setting.

Enable PII redaction

To get started with the Sensitive Data Protection feature in Weave, complete the following steps:

  1. Install the required dependencies:

    pip install presidio-analyzer presidio-anonymizer
  2. Modify your weave.init call to enable redaction. When redact_pii=True, common entities are redacted by default:

    import weave

    weave.init("my-project", settings={"redact_pii": True})
  3. (Optional) Customize redaction fields using the redact_pii_fields parameter:

    weave.init("my-project", settings={"redact_pii": True, "redact_pii_fields":["CREDIT_CARD", "US_SSN"]})

    For a full list of the entities that can be detected and redacted, see PII entities supported by Presidio.

Entities redacted by default

The following entities are automatically redacted when PII redaction is enabled:

  • CREDIT_CARD
  • CRYPTO
  • EMAIL_ADDRESS
  • ES_NIF
  • FI_PERSONAL_IDENTITY_CODE
  • IBAN_CODE
  • IN_AADHAAR
  • IN_PAN
  • IP_ADDRESS
  • LOCATION
  • PERSON
  • PHONE_NUMBER
  • UK_NHS
  • UK_NINO
  • US_BANK_NUMBER
  • US_DRIVER_LICENSE
  • US_PASSPORT
  • US_SSN

Redacting sensitive keys

In addition to PII redaction, the Weave SDK also supports redaction of custom keys via the sanitize helpers. This is useful when you want to protect additional sensitive data that might not fall under the PII category but needs to be kept private. Examples include:

  • API keys
  • Authentication headers
  • Tokens
  • Internal IDs
  • Config values

Default redacted keys

Weave automatically redacts the following sensitive keys by default:

["api_key", "auth_headers", "authorization"]

Adding your own keys

You can extend this list with your own custom keys that you want to redact from traces:

import weave
from weave.utils import sanitize

client = weave.init("my-project")

# Add custom keys to redact
sanitize.add_redact_key("token")
sanitize.add_redact_key("client_id")
sanitize.add_redact_key("whatever_else")

token = "secret_token_123"
client_id = "123"
whatever_else = "456"

@weave.op()
def test():
a = token
b = client_id
c = whatever_else
return 1

When viewed in the Weave UI, the values of token, client_id, and whatever_else will appear as "REDACTED":

token = "REDACTED"
client_id = "REDACTED"
whatever_else = "REDACTED"

Managing redacted keys

The sanitize module provides additional functions to manage redacted keys:

from weave.utils import sanitize

# Get the current list of redacted keys
current_keys = sanitize.get_redact_keys()
print(current_keys) # {'api_key', 'auth_headers', 'authorization', 'token', 'client_id', 'whatever_else'}

# Remove a key from the redaction list (if needed)
sanitize.remove_redact_key("whatever_else")

# Check if a specific key will be redacted (case-insensitive)
will_redact = sanitize.should_redact("TOKEN") # Returns True

Usage information

  • This feature is only available in the Python SDK.
  • Enabling redaction increases processing time due to the Presidio dependency.