How to Identify Data Outliers

From Open Risk Manual
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

How to Identify Data Outliers

A standardized procedure for systematically identifying data outliers in a univariate sense comprises of the following steps:

Issues and Challenges

This methodology aims to provide a powerful filter that can quickly identify outliers in large sets of variables but it does not provide an automatic solution.

  • Outliers are ultimately defined in a certain Data Generation Process, Data Collection Process and data modelling and usage context. Hence what is an outlier can change depending on that context
  • The above methodology does not apply to detecting outliers in a multivariate sense
  • The above methodology is less suited to detect outliers in categorical data
  • The above methodology is less suited for data with complicated multi-modal distributions

References