{"value":"In the Internet age, many computational tasks involve finding a handful of solutions in an enormous space of candidates. Question-answering systems, for instance, can pull answers from anywhere on the web, while the Wikipedia taxonomy for classifying article topic classification has 500,000 terms. And of course, a product query at the Amazon Store has millions of potential matches.\n\nSuch extreme multilabel ranking (XMR) problems pose two major challenges. The first is one of scale, but the second is one of scarcity. The items in these large search spaces tend to have long-tailed distributions: most sentences rarely serve as answers to questions; most topics in the Wikipedia taxonomy rarely apply to texts; most products are rarely purchased; and so on. That means that attempts to use machine learning to solve XMR problems rarely have enough data to go on.\n\nAt Amazon, we have developed a general framework for meeting both these challenges, which we call PECOS, for prediction for enormous and correlated output spaces. After successfully using PECOS internally for key projects in product search and recommendation, [we have publicly released the code](https://github.com/amzn/pecos) to help stimulate further research on this important topic.\n\nIn the XMR context, the items retrieved from the search space are known as labels. If the task is document retrieval, the documents themselves are interpreted as candidate labels for a search string; the search string is the input. The “multilabel” in XMR indicates that a given input may have multiple labels; several different topics from the Wikipedia taxonomy, for instance, might apply to the same document.\n\nPECOS decomposes the XMR problem into three stages:\n\n1. semantic label indexing, or grouping labels together according to semantic content;\n2. matching, or associating the input instance with a label group;\n3. ranking, or finding the labels in each group that best fit the input.\n\n![image.png](https://dev-media.amazoncloud.cn/461cb333b3e34effb16ead45601232a4_image.png)\n\nThe three-stage PECOS model.\nCREDIT: STACY REILLY\n\nPECOS lets users create their own algorithms to implement any of these stages, but the code release comes with a library of standard algorithms for each stage, including both a recursive linear model and a trained deep-learning model for matching.\n\nThe three-stage framework helps with both the scaling and long-tail problems. By enabling matching with groups of labels rather than individual labels, label indexing drastically reduces the search space for the matching step. It also helps with the long-tail problem, since it enables the ranking model to exploit semantic similarities between common labels and less common labels.\n\nFor machine-learning-based implementations of the ranking stage, label indexing aids in the selection of hard negatives. Machine learning models must be trained on both positive examples and negative examples; in the XMR context, most negative examples are so irrelevant as to impart little information to the model. Selecting negative examples from the same groups as the positive examples ensures that they’ll be challenging enough to improve the quality of the model.\n\nThe initial release of PECOS includes two models that implement the entire PECOS framework. One is a recursive linear model, the other a deep-learning model. In tests involving a dataset with 2.8 million labels, the deep-learning model improved the precision of the top-ranked result (precision@1) by 10% relative to the recursive linear model, but it took 265 times as long to train. It’s up to the individual users to evaluate that trade-off for their own use cases.\n\n#### **Semantic label indexing**\n\nSemantic label indexing has two components: a representation scheme and a grouping algorithm. For text-based inputs, the representation scheme might take advantage of pre-trained text embeddings such as Word2Vec or ELMo; for graph-based inputs, it might use information about the input’s relationships with its neighbors in the graph. PECOS includes efficient implementations of representation schemes such as positive instance indices (PII), positive instance feature aggregation (PIFA), and the graph spectrum representation.\n\nFor grouping, we’ve concentrated on clustering algorithms, but users could implement other approaches, such as approximate nearest-neighbor search. PECOS includes our implementations of the k-means and spherical k-means clustering algorithms, which feature recursive B-ary partitioning. For some value of B (usually between 2 and 16), the algorithm first partitions the label set into B clusters, then partitions each of those into B clusters, and so on.\n\n![image.png](https://dev-media.amazoncloud.cn/1a0f0829bb2f4fd6838c9eadd5d30c3f_image.png)\n\nA simple example of our B-ary partitioning scheme.\n\nIn a [paper about PECOS](https://arxiv.org/pdf/2010.05878.pdf) that we’ve published to the arXiv, we show that B-ary partitioning can significantly reduce the time required for semantic-label indexing, an important consideration given that we’re dealing with enormous label spaces. We also use the B-ary partitioning to implement the recursive linear model.\n\n#### **Built-in models**\n\nFor text inputs, PECOS includes X-Transformer, which leverages pretrained transformer models from Huggingface to improve performance on extreme multilabel text classification applications. At the 2020 Conference on Knowledge Discovery and Data Mining (KDD), we presented a [paper about the PECOS deep-learning model](https://www.amazon.science/publications/taming-pretrained-transformers-for-extreme-multi-label-text-classification), which we also described in [a related blog post](https://www.amazon.science/blog/natural-language-processing-techniques-text-classification-with-Transformers-at-scale) on Amazon Science.\n\nPECOS also includes a linear model, XR-Linear, which learns its matching algorithm recursively. First, it learns a B-ary partition of the label space. Then, to implement a matcher for that partition, it learns a new B-ary partition for each of the existing groups. To implement matchers for those, it learns a new B-ary partition for each, and so on, until it reaches the desired recursive depth. At that point, it learns a simple linear one-versus-all ranker for the labels in each partition.\n\nThen, for each level of recursion, it learns a ranker for the outputs of the layer below.\n\n![image.png](https://dev-media.amazoncloud.cn/6a1f805993d64510821676d95f45abe6_image.png)\n\nA diagram of the recursive linear matcher.\n\nThis makes training very efficient, as the full set of weights for each recursive layer can fit in memory at once, saving time on inefficient retrieval from storage.\n\nAt inference time, XR-Linear works through the same recursion tree to identify relevant labels. For efficiency, we use beam search to restrict the search space. For instance, if the beam width is two, then at each layer of the recursion tree, the model will pursue only the two highest-weight connections to the next layer.\n\n![image.png](https://dev-media.amazoncloud.cn/3325b471f6784d5caf39969ec5b9d653_image.png)\n\nAn example of linear ranking with a beam width of two. At each level of the tree, two nodes (green) are selected for further exploration. Each of their descendant nodes is evaluated (orange), and two of those are selected for further exploration.\nCREDIT: GIANA BUCCHINO\n\nOur PECOS software has benefited from open research that has been conducted at Amazon and at other universities and companies. By open-sourcing the PECOS software, we are thrilled to contribute back to the open-research community. Our hope is to spur further research on problems where the output spaces are very large. These include zero-shot learning for extreme multilabel problems, extreme contextual bandits, and deep reinforcement learning.\n\nFor more information about the optimizations we’ve incorporated into the PECOS code release, please see our [arXiv paper](https://arxiv.org/pdf/2010.05878.pdf). The code itself can be [downloaded at GitHub](https://github.com/amzn/pecos).\n\nABOUT THE AUTHOR\n#### **[Hsiang-Fu Yu](https://www.amazon.science/author/hsiang-fu-yu)**\nHsiang-Fu Yu is a senior applied scientist at Amazon.\n#### **Inderjit S. Dhillon**\nInderjit S. Dhillon is a vice president and distinguished scientist at Amazon and the Gottesman Family Centennial Professor in the computer science department of the University of Texas at Austin.","render":"<p>In the Internet age, many computational tasks involve finding a handful of solutions in an enormous space of candidates. Question-answering systems, for instance, can pull answers from anywhere on the web, while the Wikipedia taxonomy for classifying article topic classification has 500,000 terms. And of course, a product query at the Amazon Store has millions of potential matches.</p>\n<p>Such extreme multilabel ranking (XMR) problems pose two major challenges. The first is one of scale, but the second is one of scarcity. The items in these large search spaces tend to have long-tailed distributions: most sentences rarely serve as answers to questions; most topics in the Wikipedia taxonomy rarely apply to texts; most products are rarely purchased; and so on. That means that attempts to use machine learning to solve XMR problems rarely have enough data to go on.</p>\n<p>At Amazon, we have developed a general framework for meeting both these challenges, which we call PECOS, for prediction for enormous and correlated output spaces. After successfully using PECOS internally for key projects in product search and recommendation, <a href=\"https://github.com/amzn/pecos\" target=\"_blank\">we have publicly released the code</a> to help stimulate further research on this important topic.</p>\n<p>In the XMR context, the items retrieved from the search space are known as labels. If the task is document retrieval, the documents themselves are interpreted as candidate labels for a search string; the search string is the input. The “multilabel” in XMR indicates that a given input may have multiple labels; several different topics from the Wikipedia taxonomy, for instance, might apply to the same document.</p>\n<p>PECOS decomposes the XMR problem into three stages:</p>\n<ol>\n<li>semantic label indexing, or grouping labels together according to semantic content;</li>\n<li>matching, or associating the input instance with a label group;</li>\n<li>ranking, or finding the labels in each group that best fit the input.</li>\n</ol>\n<p><img src=\"https://dev-media.amazoncloud.cn/461cb333b3e34effb16ead45601232a4_image.png\" alt=\"image.png\" /></p>\n<p>The three-stage PECOS model.<br />\nCREDIT: STACY REILLY</p>\n<p>PECOS lets users create their own algorithms to implement any of these stages, but the code release comes with a library of standard algorithms for each stage, including both a recursive linear model and a trained deep-learning model for matching.</p>\n<p>The three-stage framework helps with both the scaling and long-tail problems. By enabling matching with groups of labels rather than individual labels, label indexing drastically reduces the search space for the matching step. It also helps with the long-tail problem, since it enables the ranking model to exploit semantic similarities between common labels and less common labels.</p>\n<p>For machine-learning-based implementations of the ranking stage, label indexing aids in the selection of hard negatives. Machine learning models must be trained on both positive examples and negative examples; in the XMR context, most negative examples are so irrelevant as to impart little information to the model. Selecting negative examples from the same groups as the positive examples ensures that they’ll be challenging enough to improve the quality of the model.</p>\n<p>The initial release of PECOS includes two models that implement the entire PECOS framework. One is a recursive linear model, the other a deep-learning model. In tests involving a dataset with 2.8 million labels, the deep-learning model improved the precision of the top-ranked result (precision@1) by 10% relative to the recursive linear model, but it took 265 times as long to train. It’s up to the individual users to evaluate that trade-off for their own use cases.</p>\n<h4><a id=\"Semantic_label_indexing_27\"></a><strong>Semantic label indexing</strong></h4>\n<p>Semantic label indexing has two components: a representation scheme and a grouping algorithm. For text-based inputs, the representation scheme might take advantage of pre-trained text embeddings such as Word2Vec or ELMo; for graph-based inputs, it might use information about the input’s relationships with its neighbors in the graph. PECOS includes efficient implementations of representation schemes such as positive instance indices (PII), positive instance feature aggregation (PIFA), and the graph spectrum representation.</p>\n<p>For grouping, we’ve concentrated on clustering algorithms, but users could implement other approaches, such as approximate nearest-neighbor search. PECOS includes our implementations of the k-means and spherical k-means clustering algorithms, which feature recursive B-ary partitioning. For some value of B (usually between 2 and 16), the algorithm first partitions the label set into B clusters, then partitions each of those into B clusters, and so on.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/1a0f0829bb2f4fd6838c9eadd5d30c3f_image.png\" alt=\"image.png\" /></p>\n<p>A simple example of our B-ary partitioning scheme.</p>\n<p>In a <a href=\"https://arxiv.org/pdf/2010.05878.pdf\" target=\"_blank\">paper about PECOS</a> that we’ve published to the arXiv, we show that B-ary partitioning can significantly reduce the time required for semantic-label indexing, an important consideration given that we’re dealing with enormous label spaces. We also use the B-ary partitioning to implement the recursive linear model.</p>\n<h4><a id=\"Builtin_models_39\"></a><strong>Built-in models</strong></h4>\n<p>For text inputs, PECOS includes X-Transformer, which leverages pretrained transformer models from Huggingface to improve performance on extreme multilabel text classification applications. At the 2020 Conference on Knowledge Discovery and Data Mining (KDD), we presented a <a href=\"https://www.amazon.science/publications/taming-pretrained-transformers-for-extreme-multi-label-text-classification\" target=\"_blank\">paper about the PECOS deep-learning model</a>, which we also described in <a href=\"https://www.amazon.science/blog/natural-language-processing-techniques-text-classification-with-Transformers-at-scale\" target=\"_blank\">a related blog post</a> on Amazon Science.</p>\n<p>PECOS also includes a linear model, XR-Linear, which learns its matching algorithm recursively. First, it learns a B-ary partition of the label space. Then, to implement a matcher for that partition, it learns a new B-ary partition for each of the existing groups. To implement matchers for those, it learns a new B-ary partition for each, and so on, until it reaches the desired recursive depth. At that point, it learns a simple linear one-versus-all ranker for the labels in each partition.</p>\n<p>Then, for each level of recursion, it learns a ranker for the outputs of the layer below.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/6a1f805993d64510821676d95f45abe6_image.png\" alt=\"image.png\" /></p>\n<p>A diagram of the recursive linear matcher.</p>\n<p>This makes training very efficient, as the full set of weights for each recursive layer can fit in memory at once, saving time on inefficient retrieval from storage.</p>\n<p>At inference time, XR-Linear works through the same recursion tree to identify relevant labels. For efficiency, we use beam search to restrict the search space. For instance, if the beam width is two, then at each layer of the recursion tree, the model will pursue only the two highest-weight connections to the next layer.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/3325b471f6784d5caf39969ec5b9d653_image.png\" alt=\"image.png\" /></p>\n<p>An example of linear ranking with a beam width of two. At each level of the tree, two nodes (green) are selected for further exploration. Each of their descendant nodes is evaluated (orange), and two of those are selected for further exploration.<br />\nCREDIT: GIANA BUCCHINO</p>\n<p>Our PECOS software has benefited from open research that has been conducted at Amazon and at other universities and companies. By open-sourcing the PECOS software, we are thrilled to contribute back to the open-research community. Our hope is to spur further research on problems where the output spaces are very large. These include zero-shot learning for extreme multilabel problems, extreme contextual bandits, and deep reinforcement learning.</p>\n<p>For more information about the optimizations we’ve incorporated into the PECOS code release, please see our <a href=\"https://arxiv.org/pdf/2010.05878.pdf\" target=\"_blank\">arXiv paper</a>. The code itself can be <a href=\"https://github.com/amzn/pecos\" target=\"_blank\">downloaded at GitHub</a>.</p>\n<p>ABOUT THE AUTHOR</p>\n<h4><a id=\"HsiangFu_Yuhttpswwwamazonscienceauthorhsiangfuyu_65\"></a><strong><a href=\"https://www.amazon.science/author/hsiang-fu-yu\" target=\"_blank\">Hsiang-Fu Yu</a></strong></h4>\n<p>Hsiang-Fu Yu is a senior applied scientist at Amazon.</p>\n<h4><a id=\"Inderjit_S_Dhillon_67\"></a><strong>Inderjit S. Dhillon</strong></h4>\n<p>Inderjit S. Dhillon is a vice president and distinguished scientist at Amazon and the Gottesman Family Centennial Professor in the computer science department of the University of Texas at Austin.</p>\n"}