Amazon contributes novel causal machine learning algorithms to DoWhy Python library

{"value":"We are excited to announce that we are open-sourcing causal machine learning (ML) algorithms that are the result of ++[years of Amazon research](https://www.amazon.science/tag/causal-discovery)++ on ++[graphical](https://en.wikipedia.org/wiki/Graph_(discrete_mathematics))++ causal models. The algorithms enable a variety of complex causal queries in addition to the usual effect estimation, including but not limited to root-cause analysis of outliers and distribution changes, causal-structure learning, and diagnosis of causal structures. Internally, they have been used by Amazon teams ranging from ++[Supply Chain](https://www.amazon.science/tag/supply-chain-optimization-technologies)++ to Amazon Web Services (Amazon Web Services).\n\nWe are also excited that, in ++[a joint effort with Microsoft](https://www.microsoft.com/en-us/research/blog/dowhy-evolves-to-independent-pywhy-model-to-help-causal-inference-grow/)++, we have created a new GitHub organization called PyWhy. PyWhy serves as the new home of DoWhy, a causal ML library from Microsoft, which we are merging our algorithms into. DoWhy is one of the most popular causality libraries on GitHub. Amazon and Microsoft are delighted to be working together with the community of DoWhy users and contributors. As our colleague Amazon principal scientist ++[Dominik Janzing](https://www.amazon.science/author/dominik-janzing)++ said, \"It's exciting to see our team’s work of the last three years shared with the whole scientific community.\"\n\n#### **Graphical causal models**\n\nMost real-world systems, be they distributed-computing systems, supply chain systems, or manufacturing processes, can be described using variables that may or may not exert causal influence on each other.\n\nThink, for instance, of a microservice architecture consisting of many different web services. What is the cause of increased website loading times? Is it a slow database in the back end? A malfunctioning load balancer? A slow network?\n\nExisting libraries for causality, including DoWhy, focus on various types of effect estimation, where the general goal is to identify the effect of interventions on some target variable. In the case of a microservice architecture, they would help answer questions like “If I make this change in my caching service configuration, will it improve the website loading times, or will it make them worse?”\n\nOur contribution complements DoWhy’s existing feature set by leveraging the power of graphical causal models (GCMs). GCMs are a formal framework developed by Turing Award winner ++[Judea Pearl](https://en.wikipedia.org/wiki/Judea_Pearl)++ to model cause-effect relationships between variables in a system. A key ingredient of GCMs is the causal diagrams, which visually represent the cause-effect relationships among the observed variables, with an arrow from a cause to its effect.\n\n![下载.jpg](https://dev-media.amazoncloud.cn/10b317d8d2334506bcd64a98cd065ef5_%E4%B8%8B%E8%BD%BD.jpg)\n\nIn effect estimation (left), analysts intervene at some point in a causal process (hammer) and observe the consequences (orange). This cuts out the influence of causes upstream from the point of intervention (scissors), and the effects of the intervention may vary (color gradations) as they propagate through the causal chain. In root-cause analysis (right), by contrast, analysts observe an effect — here, a website slowdown — and, by systematically controlling for other explanations, identify the event — here, a problem with the caching service — most directly responsible for it.\n\nEach variable in a causal diagram has its own causal mechanism, which describes how its values are generated from the values of its parents. We can train probabilistic models to learn these causal mechanisms and use them to attribute anomalous events or changes in mechanisms to specific nodes. This decomposition into contributions of mechanisms is the core idea behind our novel algorithms for root-cause analysis.\n\nAs an example, in the microservice architecture mentioned above, we might accidentally be deploying a defective service, which uses a suboptimal SQL query to get data from the database, increasing website latencies. Using a feature we call “distribution change attribution”, we can identify the defective service.\n\n![下载 1.jpg](https://dev-media.amazoncloud.cn/d2c3ebe61805403787ccc3e9d94bc6a3_%E4%B8%8B%E8%BD%BD%20%281%29.jpg)\n\nAn Amazon algorithm for root-cause analysis adapts the game-theoretical concept of Shapley values to determine the contributions of different causal mechanisms to the outcome of causal sequence. From \"++[Explaining changes in real-world data](https://www.amazon.science/blog/explaining-changes-in-real-world-data)++\".\n\nBut GCMs can do more: they can be used to compute the effects of interventions, estimate counterfactuals, compute the direct and intrinsic influences of nodes on their descendants, or attribute anomalies to potential upstream root causes. By releasing our algorithms, we hope to make these tools available to a broader audience of researchers and practitioners and help advance the scientific methods around GCMs.\n\n#### **PyWhy**\n\nFor effect estimation, DoWhy already uses two of the most popular scientific frameworks for causal inference — graphical causal models and potential outcomes — and combines them in one library. With our contribution, we hope we can drive the synergy between the frameworks and their dedicated research communities further.\n\nBut our long-term vision goes beyond DoWhy, potential outcomes, and GCMs. This is reflected in our effort to create PyWhy and our commitment to help steer the direction of this new GitHub organization. We welcome others to join our efforts and become part of the community.\n\nOur hope and ambition for PyWhy — as its mission states — is to “build an open-source ecosystem for causal machine learning that moves forward the state of the art and makes it available to practitioners and researchers. We build and host interoperable libraries, tools, and other resources spanning a variety of causal tasks and applications, connected through a common API on foundational causal operations and a focus on the end-to-end-analysis process.”\n\nSo if you are a scientist working on causal ML problems or are curious about them, visit ++[py-why.github.io/dowhy/gcm](http://py-why.github.io/dowhy/gcm)++ to learn more about the new GCM features in DoWhy or browse the source code on ++[github.com/py-why/dowhy](http://github.com/py-why/dowhy)++.\n\nIf you’re the owner of a causal ML library and think your library would be a good fit for PyWhy, visit ++[github.com/py-why](http://github.com/py-why)++ to learn more about this new organization, or come ++[talk to us](https://discord.com/invite/cSBGb3vsZb)++ on Discord.\n\nAcknowledgments: ++[Patrick Bloebaum](https://www.amazon.science/author/patrick-bloebaum)++, ++[Dominik Janzing](https://www.amazon.science/author/dominik-janzing)++\n\nABOUT THE AUTHOR\n\n#### **[Peter Götz](https://www.amazon.science/author/peter-goetz)**\nPeter Götz is a senior software development engineer with Amazon Web Services.\n\n#### **[Kailash Budhathoki](https://www.amazon.science/author/kailash-budhathoki)**\nKailash Budhathoki is an applied scientist at Amazon.\n","render":"We are excited to announce that we are open-sourcing causal machine learning (ML) algorithms that are the result of <ins><a href=\"https://www.amazon.science/tag/causal-discovery\" target=\"_blank\">years of Amazon research</a></ins> on <ins><a href=\"https://en.wikipedia.org/wiki/Graph_(discrete_mathematics)\" target=\"_blank\">graphical</a></ins> causal models. The algorithms enable a variety of complex causal queries in addition to the usual effect estimation, including but not limited to root-cause analysis of outliers and distribution changes, causal-structure learning, and diagnosis of causal structures. Internally, they have been used by Amazon teams ranging from <ins><a href=\"https://www.amazon.science/tag/supply-chain-optimization-technologies\" target=\"_blank\">Supply Chain</a></ins> to Amazon Web Services (Amazon Web Services).\nWe are also excited that, in <ins><a href=\"https://www.microsoft.com/en-us/research/blog/dowhy-evolves-to-independent-pywhy-model-to-help-causal-inference-grow/\" target=\"_blank\">a joint effort with Microsoft</a></ins>, we have created a new GitHub organization called PyWhy. PyWhy serves as the new home of DoWhy, a causal ML library from Microsoft, which we are merging our algorithms into. DoWhy is one of the most popular causality libraries on GitHub. Amazon and Microsoft are delighted to be working together with the community of DoWhy users and contributors. As our colleague Amazon principal scientist <ins><a href=\"https://www.amazon.science/author/dominik-janzing\" target=\"_blank\">Dominik Janzing</a></ins> said, “It’s exciting to see our team’s work of the last three years shared with the whole scientific community.”\n<h4><a id=\"Graphical_causal_models_4\"></a>Graphical causal models</h4>\nMost real-world systems, be they distributed-computing systems, supply chain systems, or manufacturing processes, can be described using variables that may or may not exert causal influence on each other.\nThink, for instance, of a microservice architecture consisting of many different web services. What is the cause of increased website loading times? Is it a slow database in the back end? A malfunctioning load balancer? A slow network?\nExisting libraries for causality, including DoWhy, focus on various types of effect estimation, where the general goal is to identify the effect of interventions on some target variable. In the case of a microservice architecture, they would help answer questions like “If I make this change in my caching service configuration, will it improve the website loading times, or will it make them worse?”\nOur contribution complements DoWhy’s existing feature set by leveraging the power of graphical causal models (GCMs). GCMs are a formal framework developed by Turing Award winner <ins><a href=\"https://en.wikipedia.org/wiki/Judea_Pearl\" target=\"_blank\">Judea Pearl</a></ins> to model cause-effect relationships between variables in a system. A key ingredient of GCMs is the causal diagrams, which visually represent the cause-effect relationships among the observed variables, with an arrow from a cause to its effect.\n<img src=\"https://dev-media.amazoncloud.cn/10b317d8d2334506bcd64a98cd065ef5_%E4%B8%8B%E8%BD%BD.jpg\" alt=\"下载.jpg\" />\nIn effect estimation (left), analysts intervene at some point in a causal process (hammer) and observe the consequences (orange). This cuts out the influence of causes upstream from the point of intervention (scissors), and the effects of the intervention may vary (color gradations) as they propagate through the causal chain. In root-cause analysis (right), by contrast, analysts observe an effect — here, a website slowdown — and, by systematically controlling for other explanations, identify the event — here, a problem with the caching service — most directly responsible for it.\nEach variable in a causal diagram has its own causal mechanism, which describes how its values are generated from the values of its parents. We can train probabilistic models to learn these causal mechanisms and use them to attribute anomalous events or changes in mechanisms to specific nodes. This decomposition into contributions of mechanisms is the core idea behind our novel algorithms for root-cause analysis.\nAs an example, in the microservice architecture mentioned above, we might accidentally be deploying a defective service, which uses a suboptimal SQL query to get data from the database, increasing website latencies. Using a feature we call “distribution change attribution”, we can identify the defective service.\n<img src=\"https://dev-media.amazoncloud.cn/d2c3ebe61805403787ccc3e9d94bc6a3_%E4%B8%8B%E8%BD%BD%20%281%29.jpg\" alt=\"下载 1.jpg\" />\nAn Amazon algorithm for root-cause analysis adapts the game-theoretical concept of Shapley values to determine the contributions of different causal mechanisms to the outcome of causal sequence. From “<ins><a href=\"https://www.amazon.science/blog/explaining-changes-in-real-world-data\" target=\"_blank\">Explaining changes in real-world data</a></ins>”.\nBut GCMs can do more: they can be used to compute the effects of interventions, estimate counterfactuals, compute the direct and intrinsic influences of nodes on their descendants, or attribute anomalies to potential upstream root causes. By releasing our algorithms, we hope to make these tools available to a broader audience of researchers and practitioners and help advance the scientific methods around GCMs.\n<h4><a id=\"PyWhy_28\"></a>PyWhy</h4>\nFor effect estimation, DoWhy already uses two of the most popular scientific frameworks for causal inference — graphical causal models and potential outcomes — and combines them in one library. With our contribution, we hope we can drive the synergy between the frameworks and their dedicated research communities further.\nBut our long-term vision goes beyond DoWhy, potential outcomes, and GCMs. This is reflected in our effort to create PyWhy and our commitment to help steer the direction of this new GitHub organization. We welcome others to join our efforts and become part of the community.\nOur hope and ambition for PyWhy — as its mission states — is to “build an open-source ecosystem for causal machine learning that moves forward the state of the art and makes it available to practitioners and researchers. We build and host interoperable libraries, tools, and other resources spanning a variety of causal tasks and applications, connected through a common API on foundational causal operations and a focus on the end-to-end-analysis process.”\nSo if you are a scientist working on causal ML problems or are curious about them, visit <ins><a href=\"http://py-why.github.io/dowhy/gcm\" target=\"_blank\">py-why.github.io/dowhy/gcm</a></ins> to learn more about the new GCM features in DoWhy or browse the source code on <ins><a href=\"http://github.com/py-why/dowhy\" target=\"_blank\">github.com/py-why/dowhy</a></ins>.\nIf you’re the owner of a causal ML library and think your library would be a good fit for PyWhy, visit <ins><a href=\"http://github.com/py-why\" target=\"_blank\">github.com/py-why</a></ins> to learn more about this new organization, or come <ins><a href=\"https://discord.com/invite/cSBGb3vsZb\" target=\"_blank\">talk to us</a></ins> on Discord.\nAcknowledgments: <ins><a href=\"https://www.amazon.science/author/patrick-bloebaum\" target=\"_blank\">Patrick Bloebaum</a></ins>, <ins><a href=\"https://www.amazon.science/author/dominik-janzing\" target=\"_blank\">Dominik Janzing</a></ins>\nABOUT THE AUTHOR\n<h4><a id=\"Peter_Gtzhttpswwwamazonscienceauthorpetergoetz_44\"></a><a href=\"https://www.amazon.science/author/peter-goetz\" target=\"_blank\">Peter Götz</a></h4>\nPeter Götz is a senior software development engineer with Amazon Web Services.\n<h4><a id=\"Kailash_Budhathokihttpswwwamazonscienceauthorkailashbudhathoki_47\"></a><a href=\"https://www.amazon.science/author/kailash-budhathoki\" target=\"_blank\">Kailash Budhathoki</a></h4>\nKailash Budhathoki is an applied scientist at Amazon.\n"}

亚马逊云科技解决方案基于行业客户应用场景及技术领域的解决方案

联系亚马逊云科技专家