user_id on a latency metric seems useful. You could see latency per user. But you have a million users. Now you have a million time series for one metric. Metrics are meant to be aggregated. Tags with high cardinality defeat the purpose.
Example
- Before
- After
A request latency metric with One metric × one endpoint × one million users = one million time series.
user_id as a tag:user_id, request_id, session_id, trace_id, ip_address, url (with query params), error_message. The test: could this tag have thousands or millions of unique values?
Recommended enforcement
Enforce at edge
Strip the tag before metrics reach your provider.
Open PRs
Remove the tag from instrumentation code.