Data Handling
Data handling is the standardized context in how we want SDKs help users filter data.
Sensitive Data
SDKs should not include PII or other sensitive data in the payload by default. When building an SDK we can come across to some API that can give useful information to debug a problem. In the event that API returns data considered PII, we guard that behind a flag called Send Default PII. This is an option in the SDK called send-default-pii and is disabled by default. That means that data that is naturally sensitive is not sent by default.
Some examples of data guarded by this flag:
- When attaching HTTP requests to events
- Request Body: "raw" bodies (bodies which cannot be parsed as JSON or formdata) are removed
- HTTP Headers: known sensitive headers such as
Authorization
orCookies
are removed too. - Note that if a user explicitly sets a request on the scope, nothing is stripped from that request. The above rules only apply to integrations that come with the SDK.
- User-specific information (e.g. the current user ID according to the used web-framework) is not sent at all.
- On desktop applications
- The username logged in the device is not included. This is often a person's name.
- The machine name is not included, for example
Bruno's laptop
- SDKs don't set
{{auto}}
asuser.ip
. This instructs the server to keep the connection's IP address.*
- Specifically about IP address, it's important to note that it's standard to log IP address of incoming connecting in services on the Internet. This not only allows security tools and operations to understand abuse coming from a single IP, like spam bots and other issues. But also developers to understand if issues in their application are being triggered by a single malicious source.
Sentry server is always aware of the connecting IP address and can use it for logging in some platforms. Namely JavaScript and iOS/macOS/tvOS.
All other platforms require the event to include user.ip={{auto}}
which happens if sendDefaultPii
is set to true.
Before sending events to Sentry, the SDKs should invokes callbacks. That allows users to remove any sensitive data client-side.
before-send
andevent-processors
can be used to register a callback with custom logic to remove sensitive data.
Application State
App state can be critical to help developers reproduce bugs. For that reason, SDKs often collect app state and append to events through auto instrumentation.
When attaching data that could potentially include sensitive data or PII, it's important to:
- Add a note on the docs to notify developers.
- Mark that part of the protocol on Relay as such. This allows data scrubbing to run on those fields.
Some examples of auto instrumentation that could attach sensitive data:
- A SQL integration that includes the query. If a user doesn't use parameterized queries, and appends sensitive data to it, the SDK could include that in the event payload.
- Desktop apps including window title.
- A Web framework routing instrumentation attaching route
to
andfrom
.
Structuring Data
For better data scrubbing on the server side, SDKs should save data in a strucutured way, when possible. Starting point of the discussion was RFC-0038
Spans
This helps Relay to know what kind of data it receives and this helps with scrubbing sensitive data.
http
spans containing urls:The description of spans with
op
set tohttp
must follow the formatHTTP_METHOD scheme://host/path
(ex.GET https://example.com/foo
). If an authority is present in the URL (https://username:password@example.com
), the authority must be omitted completely. If query strings or fragments are present in the URL, both are set into the data attribute of the span.Copiedspan.setData({ 'http.query': url.getQuery(), 'http.fragment: url.getFragment(), })
Additionally all semantic conventions of OpenTelementry for http spans should be set in the
span.data
if applicable: https://opentelemetry.io/docs/reference/specification/trace/semantic_conventions/http/db
spans containing database queries: (sql, graphql, elasticsearch, mongodb, ...)The
description
fields should include the saniticed database command. All sensitive data should be removed and replaced with a placeholder.Additionally all semantic conventions of OpenTelementry for database spans should be set in the
span.data
if applicable: https://opentelemetry.io/docs/reference/specification/trace/semantic_conventions/database/
Breadcrumbs
If the message
in a breadcrumb contains an URL it should be formatted the same way as in http
spans (see above).
The query and the fragment should also be set in the data attribute like with http
spans.
Variable Size
Fields in the event payload that allow user-specified or dynamic values are restricted in size. This applies to most meta data fields, such as variables in a stack trace, as well as contexts, tags and extra data:
- Mappings of values (such as HTTP data, extra data, etc) are limited to 50 item pairs.
- Event IDs are limited to 36 characters and must be valid UUIDs.
- Tag keys are limited to 32 characters.
- Tag values are limited to 200 characters.
- Culprits are limited to 200 characters.
- Context objects are limited to 8kB.
- Individual extra data items are limited to 16kB. Total extra data is limited to 256kb.
- Messages are limited to 8192 characters.
- HTTP data (the body) is limited to 8kB. Always trim HTTP data before attaching it to the event.
- Stack traces are limited to 50 frames. If more are sent, data will be removed from the middle of the stack.
Additionally, size limits apply to all store requests for the total size of the request, event payload, and attachments. Sentry rejects all requests exceeding these limits. Please refer the following resources for the exact size limits: