{"value":"++[Outliers](https://en.wikipedia.org/wiki/Outlier)++ are rare observations where a system deviates from its usual behavior. They arise in many real-world applications (e.g., medicine, finance) and present a greater demand for explanation than ordinary events. How can we identify the \"root causes\" of outliers once they are detected?\n\nThe problem of outliers is one of the oldest problems in statistics. It has been the subject of academic investigation for more than a century. Although a lot has been done on ++[detecting outliers](https://link.springer.com/book/10.1007/978-94-015-3994-4)++, a formal way to define the “root causes” of outliers has been lacking.\n\nThis week, at the International Conference on Machine Learning (++[ICML](https://www.amazon.science/conferences-and-events/icml-2022)++), we are presenting ++[our work](https://www.amazon.science/publications/causal-structure-based-root-cause-analysis-of-outliers)++ on identifying the root causes of outliers. Our first task was to introduce a formal definition of “root cause”, because we were not able to find one in the academic literature.\n\nOur definition includes a formalization of the quantitative causal contribution of each of the root causes of an observed outlier. In other words, the contribution describes the degree to which a variable is responsible for the outlier event. This also relates to philosophical questions; even the purely qualitative question of whether an event is an “actual cause” of others is an ongoing debate among philosophers.\n\nOur approach is based on ++[graphical causal models](http://bayes.cs.ucla.edu/WHY/)++, a formal framework developed by Turing Award winner ++[Judea Pearl](https://en.wikipedia.org/wiki/Judea_Pearl)++ to model cause-effect relationships between variables in a system. It has two key ingredients. The first is a causal diagram, which visually represents the cause-effect relationships among observed variables, with arrows from the nodes representing causes to the nodes representing effects. The second is a set of causal mechanisms, which describe how the values of each node are generated from the values of its parents (i.e., direct causes) in the causal diagram.\n\nImagine, for instance, a retail website powered by distributed web services. A customer experiences an unusually slow loading time. Why? Is it a slow database in the back end? A malfunctioning buying service?\n\n![下载 8.jpg](https://dev-media.amazoncloud.cn/b8e61a41d1064515aa71f4a1e52b5e73_%E4%B8%8B%E8%BD%BD%20%288%29.jpg)\n\nAt left, we have the dependencies between the distributed web services that power a simple hypothetical retail website. In the middle, a customer (with ID 5) experiences a very slow loading time. Our goal is to identify its root causes among the distributed services (right).\n\nThere exist many ++[outlier detection algorithms](https://link.springer.com/book/10.1007/978-94-015-3994-4)++. To identify the root causes of outliers detected by one of these algorithms, we first introduce an information-theoretic (IT) outlier score, which probabilistically calibrates existing outlier scores.\n\nOur outlier score relies on the notion of the tail probability — the probability that a random variable exceeds a threshold value. The IT outlier score of an event is the negative logarithm of the event’s tail probability under some transformation. It is inspired by Claude Shannon’s definition of the ++[information content](https://en.wikipedia.org/wiki/Information_content)++ of a random event in ++[information theory](https://en.wikipedia.org/wiki/Information_theory)++.\n\nThe lower the likelihood of observing events more extreme than the event in question, the more information that event carries, and the larger its IT outlier score. Probabilistic calibration also renders IT outlier scores comparable across variables with different dimension, range, and scaling.\n\n#### **Counterfactuals**\n\nTo attribute the outlier event to a variable, we ask the counterfactual question “Would the event not have been an outlier had the causal mechanism of that variable been normal?” The counterfactuals are the third rung on ++[ Pearl’s ladder of causation](http://bayes.cs.ucla.edu/WHY/why-ch1.pdf)++ and hence require ++[functional causal models (FCMs)](http://bayes.cs.ucla.edu/BOOK-2K/ch1-4.pdf)++ as the causal mechanisms of variables.\n\nIn an FCM, each variable Xj is a function of its observed parents PAj (with direct arrows to Xj) in the causal diagram and an unobserved noise variable Nj. As root nodes — those without observed parents — have only noise variables, the joint distribution of noise variables gives rise to the stochastic properties of observed variables.\n\nThe unobserved noise variables play a special role: we can think of Nj as a random switch that selects a deterministic function (or mechanism) from a set of functions Fj defined from direct causes PAj to their effect Xj. If, instead of fixing the value of the noise term Nj, we set it to random values drawn from some distribution, then the functions from the set Fj are also selected at random, and we can use this procedure to assign normal deterministic mechanisms to Xj.\n\nAlthough this randomization operation might seem infeasible if we think of the noise variable as something not under our control — and even worse, not even observable — we can interpret it as an intervention on the observed variable.\n\n![下载 8.jpg](https://dev-media.amazoncloud.cn/95fdae1b74aa4a3481ff0e38da033b2a_%E4%B8%8B%E8%BD%BD%20%288%29.jpg)\n\nOn the left, for the observed pair (xj, paj) of variable Xj and its parents PAj, the deterministic mechanism fj(1) of variable Xj is identified by the noise value (Nj = 1) corresponding to the pair (xj, paj). In the middle, a different value of noise (Nj = n) identifies a counterfactual deterministic mechanism fj(n). On the right, by drawing random samples of the noise term Nj according to some distribution, we assign “normal” deterministic mechanisms to Xj.\n\n![下载 9.jpg](https://dev-media.amazoncloud.cn/47469d5e4bc6473281988359bb01c182_%E4%B8%8B%E8%BD%BD%20%289%29.jpg)\n\nTo attribute the outlier event xn of target variable Xn to a variable Xj, we first replace the deterministic mechanism of Xj by normal causal mechanisms (the orange background indicates the replacement). Then we measure the impact of this replacement on the log tail probability of the outlier event.\n\nTo attribute the outlier event xn (of target variable Xn) to a variable Xj, we first replace the deterministic mechanism corresponding to its observation xj by normal mechanisms. The impact of this replacement on the log tail probability defines the contribution of Xj to the outlier event. In particular, the contribution measures the factor by which replacing the causal mechanism of Xj with normal mechanisms (by drawing random samples of the noise Nj) decreases the likelihood of the outlier event. But the contribution computed this way depends on the order in which we replace the causal mechanisms. This dependence on ordering introduces arbitrariness into the attribution procedure.\n\nTo get rid of the dependence on the ordering of variables, we take the average contribution over all orderings, which is also the idea behind the ++[Shapley value approach](https://en.wikipedia.org/wiki/Shapley_value)++ in game theory. The Shapley contributions sum up to the IT outlier score of the outlier event.\n\nTo get a high-level idea of how our approach works, consider again the retail-website example mentioned above. Dependencies between web services are typically available as a dependency graph. By inverting the arrows in the dependency graph, we obtain the causal graph of latencies of services. From training samples of observed latencies, we learn the causal mechanisms. The causal mechanisms may also be established directly using subject matter expertise. Our approach uses those to attribute the slow loading time for the specific client to its most likely root causes among the web services.\n\n![下载 10.jpg](https://dev-media.amazoncloud.cn/3e964496decd45bba88441e0c071198c_%E4%B8%8B%E8%BD%BD%20%2810%29.jpg)\n\nOn the left, we have the causal graph of latencies of services, which is obtained by inverting the arrows of the dependency graph of services. By learning the causal mechanisms of nodes from training data, our approach yields the contributions of each node to the outlier event — here, the unusually high latency of the web service. As the Shapley contributions sum up to the IT outlier score of the outlier event, we are able to show the relative contribution of ancestors — here, the services.\n\nIf you would like to apply our approach to your use case, the implementation is available in the ++[“gcm” package](https://arxiv.org/abs/2206.06821)++ in the Python ++[DoWhy](https://py-why.github.io/dowhy/)++ library. To get started quickly, you can check out our ++[example notebook](https://py-why.github.io/dowhy/example_notebooks/rca_microservice_architecture.html)++.\n\nABOUT THE AUTHOR\n#### **[Kailash Budhathoki](https://www.amazon.science/author/kailash-budhathoki)**\nKailash Budhathoki is an applied scientist at Amazon.\n#### **[Patrick Bloebaum](https://www.amazon.science/author/patrick-bloebaum)**\nPatrick Blöbaum is a senior applied scientist with Amazon Web Services.","render":"<p><ins><a href=\\"https://en.wikipedia.org/wiki/Outlier\\" target=\\"_blank\\">Outliers</a></ins> are rare observations where a system deviates from its usual behavior. They arise in many real-world applications (e.g., medicine, finance) and present a greater demand for explanation than ordinary events. How can we identify the “root causes” of outliers once they are detected?</p>\n<p>The problem of outliers is one of the oldest problems in statistics. It has been the subject of academic investigation for more than a century. Although a lot has been done on <ins><a href=\\"https://link.springer.com/book/10.1007/978-94-015-3994-4\\" target=\\"_blank\\">detecting outliers</a></ins>, a formal way to define the “root causes” of outliers has been lacking.</p>\n<p>This week, at the International Conference on Machine Learning (<ins><a href=\\"https://www.amazon.science/conferences-and-events/icml-2022\\" target=\\"_blank\\">ICML</a></ins>), we are presenting <ins><a href=\\"https://www.amazon.science/publications/causal-structure-based-root-cause-analysis-of-outliers\\" target=\\"_blank\\">our work</a></ins> on identifying the root causes of outliers. Our first task was to introduce a formal definition of “root cause”, because we were not able to find one in the academic literature.</p>\n<p>Our definition includes a formalization of the quantitative causal contribution of each of the root causes of an observed outlier. In other words, the contribution describes the degree to which a variable is responsible for the outlier event. This also relates to philosophical questions; even the purely qualitative question of whether an event is an “actual cause” of others is an ongoing debate among philosophers.</p>\n<p>Our approach is based on <ins><a href=\\"http://bayes.cs.ucla.edu/WHY/\\" target=\\"_blank\\">graphical causal models</a></ins>, a formal framework developed by Turing Award winner <ins><a href=\\"https://en.wikipedia.org/wiki/Judea_Pearl\\" target=\\"_blank\\">Judea Pearl</a></ins> to model cause-effect relationships between variables in a system. It has two key ingredients. The first is a causal diagram, which visually represents the cause-effect relationships among observed variables, with arrows from the nodes representing causes to the nodes representing effects. The second is a set of causal mechanisms, which describe how the values of each node are generated from the values of its parents (i.e., direct causes) in the causal diagram.</p>\n<p>Imagine, for instance, a retail website powered by distributed web services. A customer experiences an unusually slow loading time. Why? Is it a slow database in the back end? A malfunctioning buying service?</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/b8e61a41d1064515aa71f4a1e52b5e73_%E4%B8%8B%E8%BD%BD%20%288%29.jpg\\" alt=\\"下载 8.jpg\\" /></p>\n<p>At left, we have the dependencies between the distributed web services that power a simple hypothetical retail website. In the middle, a customer (with ID 5) experiences a very slow loading time. Our goal is to identify its root causes among the distributed services (right).</p>\n<p>There exist many <ins><a href=\\"https://link.springer.com/book/10.1007/978-94-015-3994-4\\" target=\\"_blank\\">outlier detection algorithms</a></ins>. To identify the root causes of outliers detected by one of these algorithms, we first introduce an information-theoretic (IT) outlier score, which probabilistically calibrates existing outlier scores.</p>\n<p>Our outlier score relies on the notion of the tail probability — the probability that a random variable exceeds a threshold value. The IT outlier score of an event is the negative logarithm of the event’s tail probability under some transformation. It is inspired by Claude Shannon’s definition of the <ins><a href=\\"https://en.wikipedia.org/wiki/Information_content\\" target=\\"_blank\\">information content</a></ins> of a random event in <ins><a href=\\"https://en.wikipedia.org/wiki/Information_theory\\" target=\\"_blank\\">information theory</a></ins>.</p>\n<p>The lower the likelihood of observing events more extreme than the event in question, the more information that event carries, and the larger its IT outlier score. Probabilistic calibration also renders IT outlier scores comparable across variables with different dimension, range, and scaling.</p>\n<h4><a id=\\"Counterfactuals_22\\"></a><strong>Counterfactuals</strong></h4>\\n<p>To attribute the outlier event to a variable, we ask the counterfactual question “Would the event not have been an outlier had the causal mechanism of that variable been normal?” The counterfactuals are the third rung on <ins><a href=\\"http://bayes.cs.ucla.edu/WHY/why-ch1.pdf\\" target=\\"_blank\\"> Pearl’s ladder of causation</a></ins> and hence require <ins><a href=\\"http://bayes.cs.ucla.edu/BOOK-2K/ch1-4.pdf\\" target=\\"_blank\\">functional causal models (FCMs)</a></ins> as the causal mechanisms of variables.</p>\n<p>In an FCM, each variable Xj is a function of its observed parents PAj (with direct arrows to Xj) in the causal diagram and an unobserved noise variable Nj. As root nodes — those without observed parents — have only noise variables, the joint distribution of noise variables gives rise to the stochastic properties of observed variables.</p>\n<p>The unobserved noise variables play a special role: we can think of Nj as a random switch that selects a deterministic function (or mechanism) from a set of functions Fj defined from direct causes PAj to their effect Xj. If, instead of fixing the value of the noise term Nj, we set it to random values drawn from some distribution, then the functions from the set Fj are also selected at random, and we can use this procedure to assign normal deterministic mechanisms to Xj.</p>\n<p>Although this randomization operation might seem infeasible if we think of the noise variable as something not under our control — and even worse, not even observable — we can interpret it as an intervention on the observed variable.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/95fdae1b74aa4a3481ff0e38da033b2a_%E4%B8%8B%E8%BD%BD%20%288%29.jpg\\" alt=\\"下载 8.jpg\\" /></p>\n<p>On the left, for the observed pair (xj, paj) of variable Xj and its parents PAj, the deterministic mechanism fj(1) of variable Xj is identified by the noise value (Nj = 1) corresponding to the pair (xj, paj). In the middle, a different value of noise (Nj = n) identifies a counterfactual deterministic mechanism fj(n). On the right, by drawing random samples of the noise term Nj according to some distribution, we assign “normal” deterministic mechanisms to Xj.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/47469d5e4bc6473281988359bb01c182_%E4%B8%8B%E8%BD%BD%20%289%29.jpg\\" alt=\\"下载 9.jpg\\" /></p>\n<p>To attribute the outlier event xn of target variable Xn to a variable Xj, we first replace the deterministic mechanism of Xj by normal causal mechanisms (the orange background indicates the replacement). Then we measure the impact of this replacement on the log tail probability of the outlier event.</p>\n<p>To attribute the outlier event xn (of target variable Xn) to a variable Xj, we first replace the deterministic mechanism corresponding to its observation xj by normal mechanisms. The impact of this replacement on the log tail probability defines the contribution of Xj to the outlier event. In particular, the contribution measures the factor by which replacing the causal mechanism of Xj with normal mechanisms (by drawing random samples of the noise Nj) decreases the likelihood of the outlier event. But the contribution computed this way depends on the order in which we replace the causal mechanisms. This dependence on ordering introduces arbitrariness into the attribution procedure.</p>\n<p>To get rid of the dependence on the ordering of variables, we take the average contribution over all orderings, which is also the idea behind the <ins><a href=\\"https://en.wikipedia.org/wiki/Shapley_value\\" target=\\"_blank\\">Shapley value approach</a></ins> in game theory. The Shapley contributions sum up to the IT outlier score of the outlier event.</p>\n<p>To get a high-level idea of how our approach works, consider again the retail-website example mentioned above. Dependencies between web services are typically available as a dependency graph. By inverting the arrows in the dependency graph, we obtain the causal graph of latencies of services. From training samples of observed latencies, we learn the causal mechanisms. The causal mechanisms may also be established directly using subject matter expertise. Our approach uses those to attribute the slow loading time for the specific client to its most likely root causes among the web services.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/3e964496decd45bba88441e0c071198c_%E4%B8%8B%E8%BD%BD%20%2810%29.jpg\\" alt=\\"下载 10.jpg\\" /></p>\n<p>On the left, we have the causal graph of latencies of services, which is obtained by inverting the arrows of the dependency graph of services. By learning the causal mechanisms of nodes from training data, our approach yields the contributions of each node to the outlier event — here, the unusually high latency of the web service. As the Shapley contributions sum up to the IT outlier score of the outlier event, we are able to show the relative contribution of ancestors — here, the services.</p>\n<p>If you would like to apply our approach to your use case, the implementation is available in the <ins><a href=\\"https://arxiv.org/abs/2206.06821\\" target=\\"_blank\\">“gcm” package</a></ins> in the Python <ins><a href=\\"https://py-why.github.io/dowhy/\\" target=\\"_blank\\">DoWhy</a></ins> library. To get started quickly, you can check out our <ins><a href=\\"https://py-why.github.io/dowhy/example_notebooks/rca_microservice_architecture.html\\" target=\\"_blank\\">example notebook</a></ins>.</p>\n<p>ABOUT THE AUTHOR</p>\n<h4><a id=\\"Kailash_Budhathokihttpswwwamazonscienceauthorkailashbudhathoki_53\\"></a><strong><a href=\\"https://www.amazon.science/author/kailash-budhathoki\\" target=\\"_blank\\">Kailash Budhathoki</a></strong></h4>\n<p>Kailash Budhathoki is an applied scientist at Amazon.</p>\n<h4><a id=\\"Patrick_Bloebaumhttpswwwamazonscienceauthorpatrickbloebaum_55\\"></a><strong><a href=\\"https://www.amazon.science/author/patrick-bloebaum\\" target=\\"_blank\\">Patrick Bloebaum</a></strong></h4>\n<p>Patrick Blöbaum is a senior applied scientist with Amazon Web Services.</p>\n"}