Custom packages and hot reload of dictionary files with Amazon OpenSearch Service

海外精选
38
0
{"value":"[Amazon OpenSearch Service](https://aws.amazon.com/opensearch-service/) is a fully managed service that you can use to deploy and operate OpenSearch clusters cost-effectively at scale in the Amazon Web Services Cloud. The service makes it easy for you to perform interactive log analytics, real-time application monitoring, website search, and more by offering the latest versions of OpenSearch, support for 19 versions of Elasticsearch (1.5 to 7.10 versions), and visualization capabilities powered by OpenSearch Dashboards and Kibana (1.5 to 7.10 versions).\n\nThere are various use cases such as website search, ecommerce search, and enterprise search where the user wants to get relevant content for specific terms. Search engines match the terms (words) sent through the query API. When there are many different ways of specifying the same concept, you use synonyms to give the search engine more match targets than what the user entered.\n\nSimilarly, there are certain use cases where input data has a lot of common or frequently occurring words that don’t add much relevance when used in a search query. These include words like “the,” “this,” and “that.” These can be classified as stopwords.\n\nOpenSearch Service allows you to upload custom dictionary files, which can include synonyms and stopwords to be customized to your use case. This is especially useful for use cases where you want to do the following:\n\n- Specify words that can be treated as equivalent. For example, you can specify that words such as “bread,” “danish,” and “croissant” be treated as synonymous. This leads to better search results because instead of returning a null result if an exact match isn’t found, an approximately relevant or equivalent result is returned.\n- Ignore certain high frequency terms that are common and lack useful information in terms of contributing to the search’s relevance store. These could include “a,” “the,” “of,” “an,” and so on.\n\n\nSpecifying stems, synonyms, and stopwords can greatly help with query accuracy, and allows you to customize and enhance query relevance. They can also help with stemming (such as in the Japanese (kuromoji) Analysis Plugin). Stemming is reducing a word to its root form. For Example, “cooking” and “cooked” can be stemmed to the same root word “cook.” This way, any variants of a word can be stemmed to one root word to enhance the query results.\n\nIn this post, we show how we can add custom packages for synonyms and stopwords to an OpenSearch Service domain. We start by creating custom packages for synonyms and stopwords and creating a custom analyzer for a sample index that uses the standard tokenizer and a synonym token filter, followed by a demonstration of hot reload of dictionary files.\n\n\n#### **Tokenizers and token filters**\n\n\nTokenizers break streams of characters into tokens (typically words) based on some set of rules. The simplest example is the whitespace tokenizer, which breaks the preceding characters into a token each time it encounters a whitespace character. A more complex example is the standard tokenizer, which uses a set of grammar-based rules to work across many languages.\n\nToken filters add, modify, or delete tokens. For example, a synonym token filter adds tokens when it finds a word in the synonyms list. The stop token filter removes tokens when finds a word in the stopwords list.\n\n\n#### **Prerequisites**\n\n\nFor this demo, you must have an OpenSearch Service cluster (version 1.2) running. You can use this feature on any version of OpenSearch Service running 7.8+.\n\nUsers without administrator access require certain [Amazon Web Services Identity and Access Management](http://aws.amazon.com/iam) (IAM) actions in order to manage packages: ```es:CreatePackage```, ```es:DeletePackage```, ```es:AssociatePackage```, and ```es:DissociatePackage```. The user also needs permissions on the Amazon Simple Storage Service (Amazon S3) bucket path or object where the custom package resides. Grant all permission within IAM, not in the domain access policy. This allows for better management of permissions because any change in permissions can be separate from the domain and allows the user to perform the same action across multiple OpenSearch Service domains (if needed).\n\n\n#### **Set up the custom packages**\n\n\nTo set up the solution, complete the following steps:\n\n1. On the Amazon S3 console, create a bucket to hold the custom packages.\n2. Upload the files with the stopwords and synonyms to this bucket. For this post, the file contents are as follows:\n\na. synonyms.txt:\n```\n pasta, penne, ravioli \n ice cream, gelato, frozen custard\n danish, croissant, pastry, bread\n```\n\nb. stopwords.txt:\n\n```\n the\n a\n an\n of\n```\n\nThe following screenshot shows the uploaded files:\n\n![image.png](https://dev-media.amazoncloud.cn/3e99d3b7310a469d9c275058527ac57b_image.png)\n\nNow we import our packages and associate them with a domain.\n\n3. On the OpenSearch Service console, choose Packages in the navigation pane.\n\n![image.png](https://dev-media.amazoncloud.cn/9d79393b8afe4e2b99d51bce22983e59_image.png)\n\n4. Choose Import package.\n\n![image.png](https://dev-media.amazoncloud.cn/3b34f1817358407585f29ff9203b3590_image.png)\n\n5. Enter a name for your package (for the synonym package, we use ```my-custom-synonym-package```) and optional description.\n6. For **Package source**, enter the S3 location where synonyms.txt is stored.\n7. Choose **Submit**.\n\n![image.png](https://dev-media.amazoncloud.cn/66be6473a58d4dde863b85ad221cb990_image.png)\n\n8. Repeat these steps to create a package with stopwords.txt.\n9. Choose your synonym package when its status shows as ```Available```.\n\n![image.png](https://dev-media.amazoncloud.cn/ece162b529be444cab37a9afd61691b7_image.png)\n\n10. Choose **Associate to a domain**.\n\n![image.png](https://dev-media.amazoncloud.cn/d0f6114485cf4439a433e9f7534088d3_image.png)\n\n11. Select your OpenSearch Service domain, then choose **Associate**.\n\n![image.png](https://dev-media.amazoncloud.cn/fe1b0acfa4774585b0880df6f391fa53_image.png)\n\n12. Repeat these steps to associate your OpenSearch Service domain to the stopwords package.\n13. When the packages are available, note their IDs.\n\nYou use ```analyzers/id``` as the file path in your requests to OpenSearch.\n\n![image.png](https://dev-media.amazoncloud.cn/e9568123df9f493092c0474506411b45_image.png)\n\n\n#### **Use the custom packages with your data**\n\n\nAfter you associate a file with a domain, you can use it in parameters such as ```synonyms_path``` and ```stopwords_path``` when you create tokenizers and token filters. For more information, see OpenSearch Service.\n\nYou can create a new index (```my-index-test```) using the following snippet in the OpenSearch Service domain and specify the ```Analyzers/id``` values for the synonyms and stopwords packages.\n\n1. Open OpenSearch Dashboards.\n2. On the **Home** menu, choose **Dev Tools**.\n\n![image.png](https://dev-media.amazoncloud.cn/cb77ee0e21fa4ba89b0437db231ee8ac_image.png)\n\n3. Enter the following code in the left pane:\n\n```\nPUT my-index-test\n{\n \"settings\": {\n \"index\": {\n \"analysis\": {\n \"analyzer\": {\n \"my_analyzer\": {\n \"type\": \"custom\",\n \"tokenizer\": \"standard\",\n \"filter\": [\"my_stop_filter\" , \"my_synonym_filter\"]\n }\n },\n \"filter\": {\n \"my_stop_filter\": {\n \"type\": \"stop\",\n \"stopwords_path\": \"analyzers/Fxxxxxxxxx\",\n \"updateable\": true\n },\n \"my_synonym_filter\": {\n \"type\": \"synonym\",\n \"synonyms_path\": \"analyzers/Fxxxxxxxxx\",\n \"updateable\": true\n }\n \n }\n }\n }\n },\n \"mappings\": {\n \"properties\": {\n \"description\": {\n \"type\": \"text\",\n \"analyzer\": \"standard\",\n \"search_analyzer\": \"my_analyzer\"\n }\n }\n }\n}\n```\n\n4. Choose the play sign to send the request to create the index with our custom synonyms and stopwords.\n\n![image.png](https://dev-media.amazoncloud.cn/536d8f7fbb884662b5324c9b4502652c_image.png)\n\nThe following screenshot shows our results.\n\n![image.png](https://dev-media.amazoncloud.cn/84875034758a41a582932e3c253d9bc7_image.png)\n\nThis request creates a custom analyzer for my index that uses the standard tokenizer and a synonym and stop token filter. This request also adds a text field (```description```) to the mapping and tells OpenSearch to use the new analyzer as its search analyzer. It still uses the standard analyzer as its index analyzer.\n\nNote the line ```\"updateable\": true``` in the token filter. This field only applies to search analyzers, not index analyzers, and is critical if you later want to [update the search analyzer](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/custom-packages.html#custom-packages-updating) automatically.\n\nLet’s start by adding some sample data to the index ```my-index-test```:\n\n```\nPOST _bulk\n{ \"index\": { \"_index\": \"my-index-test\", \"_id\": \"1\" } }\n{ \"description\": \"pasta\" }\n{ \"index\": { \"_index\": \"my-index-test\", \"_id\": \"2\" } }\n{ \"description\": \"the bread\" }\n{ \"index\": { \"_index\": \"my-index-test\", \"_id\": \"3\" } }\n{ \"description\": \"ice cream\" }\n{ \"index\": { \"_index\": \"my-index-test\", \"_id\": \"4\" } }\n{ \"description\": \"croissant\" }\n```\n\n![image.png](https://dev-media.amazoncloud.cn/a597cd3813a74a3081e87dbeeb7223c1_image.png)\n\nNow If you search for the words you specified in the synonyms.txt file, you get the required results. Note that my test index only has ```pasta``` in the indexed data, but because I specified “ravioli” as a synonym for “pasta” in my associated package, I get the results for all documents that have the word “pasta” when I search for “ravioli.”\n\n```\nGET my-index-test/_search\n{\n \"query\": {\n \"match\": {\n \"description\": \"ravioli\"\n }\n }\n}\n```\n\n![image.png](https://dev-media.amazoncloud.cn/1df1eaf48c804e00952cedaf544d8330_image.png)\n\nSimilarly, you can use the stopwords feature to specify common words that can be filtered out while showing search results and don’t impact the relevance much while returning search query results.\n\n\n#### **Hot reload**\n\n\nNow let’s say you want to add another synonym (“spaghetti”) for “pasta.”\n\n1. The first step is to update the synonyms.txt file as follows and upload this updated file to your S3 bucket:\n\n```\npasta , penne , ravioli, spaghetti\nice cream, gelato, frozen custard\ndanish, croissant, pastry , bread\n```\n\n![image.png](https://dev-media.amazoncloud.cn/f70952d6fb00400faf2209d1f0f88785_image.png)\n\nUploading a new version of a package to Amazon S3 doesn’t automatically update the package on OpenSearch Service. OpenSearch Service stores its own copy of the file, so if you upload a new version to Amazon S3, you must manually update it in OpenSearch Service.\n\nIf you try to run the search query against the index for the term “spaghetti” at this point, you don’t get any results:\n\n```\nGET my-index-test/_search\n{\n \"query\": {\n \"match\": {\n \"description\": \"spaghetti\"\n }\n }\n}\n```\n\n![image.png](https://dev-media.amazoncloud.cn/51a15907f11944abb45d5b3eafcc6d05_image.png)\n\nAfter the file is modified in Amazon S3, update the package in OpenSearch Service, then apply the update. To do this, perform the following steps:\n\n2. On the OpenSearch Service console, choose **Packages**.\n3. Choose the package you created for custom synonyms and choose **Update**.\n\n![image.png](https://dev-media.amazoncloud.cn/8f953b6b2921419a8dab87629aa4536b_image.png)\n\n4. Provide the S3 path to the file, then choose **Update package**.\n\n![image.png](https://dev-media.amazoncloud.cn/a66ff44cfef646d696ee01d2415b2c59_image.png)\n\n5. Enter a description and choose **Update package**.\n\n![image.png](https://dev-media.amazoncloud.cn/afdfed44c7b944368122c0f37f411b73_image.png)\n\nYou return to the Packages page.\n\n6. When the package status shows as ```Available```, choose it and wait for the associated domain to show as updated.\n7. Select the domain and choose **Apply update**.\n\n![image.png](https://dev-media.amazoncloud.cn/4f2bd698a2de4742a62f32fb859b7d85_image.png)\n\n8. Choose **Apply update** again to confirm.\n\n![image.png](https://dev-media.amazoncloud.cn/b2633cd7509846b49d85325dc5aea8f6_image.png)\n\nWait for the association status to change to ```Active``` to confirm that the package version is also updated.\n\n![image.png](https://dev-media.amazoncloud.cn/0a3e2798f7b049589ae2e483c10c60f6_image.png)\n\nIf your domain runs Elasticsearch 7.7 or earlier, uses index analyzers, or doesn’t use the updateable field, and if you want to add some additional synonyms at a later time, you have to reindex your data with the new dictionary file. Previously, on Amazon Elasticsearch Service, these analyzers could only process data as it was indexed.\n\nIf your domains runs OpenSearch Service or Amazon Elasticsearch Service 7.8 or later and only uses search analyzers with the [updateable](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/custom-packages.html#custom-packages-using) field set to ```true```, you don’t need to take any further action. OpenSearch Service automatically updates your indexes using the [_plugins/_refresh_search_analyzers API](https://opensearch.org/docs/im-plugin/refresh-analyzer/index/). This allows for refresh of search analyzers in real time without you needing to close and reopen the index.\n\nThis feature called hot reload provides the ability to reload dictionary files without reindexing your data. With the new hot reload capability, you can call analyzers at search time, and your dictionary files augment the query. This feature also lets you version your dictionary files in OpenSearch Service and update them on your domains, without having to reindex your data.\n\nBecause the domain used in this demonstration runs OpenSearch Service 1.2, you can utilize this hot reload feature and without re-indexing of any data. Simply run a search query for the newly added synonym (“spaghetti”) and get all resultant documents that are synonymous to it:\n\n```\nGET my-index-test/_search\n{\n \"query\": {\n \"match\": {\n \"description\": \"spaghetti\"\n }\n }\n}\n```\n\n![image.png](https://dev-media.amazoncloud.cn/661bc44f6aec4f5093279c4192e071c6_image.png)\n\n\n\n#### **Conclusion**\n\n\nIn this post, we showed how easy it is to set up synonyms in OpenSearch Service so you can find the relevant documents that match a synonym for a word, even when the specific word isn’t used as search term. We also demonstrated how to add and update existing synonym dictionaries and load those files to reflect the changes.\n\nIf you have feedback about this post, submit your comments in the comments section. You can also start a new thread on the [OpenSearch Service forum](https://forums.aws.amazon.com/forum.jspa?forumID=200) or [contact Amazon Web Services Support](https://console.aws.amazon.com/support/home) with questions.\n\n\n##### **About the Authors**\n\n\n![image.png](https://dev-media.amazoncloud.cn/9eef5cb349fa4a129a861415fc72e9b4_image.png)\n\n**Sonam Chaudhary** is a Solutions Architect and Big Data and Analytics Specialist at Amazon Web Services. She works with customers to build scalable, highly available, cost-effective, and secure solutions in the Amazon Web Services Cloud. In her free time, she likes traveling with her husband, shopping, and watching movies.\n\n![image.png](https://dev-media.amazoncloud.cn/7e13269f6e234f5c9d5988b43c620b26_image.png)\n\n**Prashant Agrawal** is a Search Specialist Solutions Architect with OpenSearch Service. He works closely with customers to help them migrate their workloads to the cloud and helps existing customers fine-tune their clusters to achieve better performance and save on cost. Before joining Amazon Web Services, he helped various customers use OpenSearch and Elasticsearch for their search and log analytics use cases. When not working, you can find him traveling and exploring new places. In short, he likes doing Eat → Travel → Repeat.\n\n","render":"<p><a href=\"https://aws.amazon.com/opensearch-service/\" target=\"_blank\">Amazon OpenSearch Service</a> is a fully managed service that you can use to deploy and operate OpenSearch clusters cost-effectively at scale in the Amazon Web Services Cloud. The service makes it easy for you to perform interactive log analytics, real-time application monitoring, website search, and more by offering the latest versions of OpenSearch, support for 19 versions of Elasticsearch (1.5 to 7.10 versions), and visualization capabilities powered by OpenSearch Dashboards and Kibana (1.5 to 7.10 versions).</p>\n<p>There are various use cases such as website search, ecommerce search, and enterprise search where the user wants to get relevant content for specific terms. Search engines match the terms (words) sent through the query API. When there are many different ways of specifying the same concept, you use synonyms to give the search engine more match targets than what the user entered.</p>\n<p>Similarly, there are certain use cases where input data has a lot of common or frequently occurring words that don’t add much relevance when used in a search query. These include words like “the,” “this,” and “that.” These can be classified as stopwords.</p>\n<p>OpenSearch Service allows you to upload custom dictionary files, which can include synonyms and stopwords to be customized to your use case. This is especially useful for use cases where you want to do the following:</p>\n<ul>\n<li>Specify words that can be treated as equivalent. For example, you can specify that words such as “bread,” “danish,” and “croissant” be treated as synonymous. This leads to better search results because instead of returning a null result if an exact match isn’t found, an approximately relevant or equivalent result is returned.</li>\n<li>Ignore certain high frequency terms that are common and lack useful information in terms of contributing to the search’s relevance store. These could include “a,” “the,” “of,” “an,” and so on.</li>\n</ul>\n<p>Specifying stems, synonyms, and stopwords can greatly help with query accuracy, and allows you to customize and enhance query relevance. They can also help with stemming (such as in the Japanese (kuromoji) Analysis Plugin). Stemming is reducing a word to its root form. For Example, “cooking” and “cooked” can be stemmed to the same root word “cook.” This way, any variants of a word can be stemmed to one root word to enhance the query results.</p>\n<p>In this post, we show how we can add custom packages for synonyms and stopwords to an OpenSearch Service domain. We start by creating custom packages for synonyms and stopwords and creating a custom analyzer for a sample index that uses the standard tokenizer and a synonym token filter, followed by a demonstration of hot reload of dictionary files.</p>\n<h4><a id=\"Tokenizers_and_token_filters_17\"></a><strong>Tokenizers and token filters</strong></h4>\n<p>Tokenizers break streams of characters into tokens (typically words) based on some set of rules. The simplest example is the whitespace tokenizer, which breaks the preceding characters into a token each time it encounters a whitespace character. A more complex example is the standard tokenizer, which uses a set of grammar-based rules to work across many languages.</p>\n<p>Token filters add, modify, or delete tokens. For example, a synonym token filter adds tokens when it finds a word in the synonyms list. The stop token filter removes tokens when finds a word in the stopwords list.</p>\n<h4><a id=\"Prerequisites_25\"></a><strong>Prerequisites</strong></h4>\n<p>For this demo, you must have an OpenSearch Service cluster (version 1.2) running. You can use this feature on any version of OpenSearch Service running 7.8+.</p>\n<p>Users without administrator access require certain <a href=\"http://aws.amazon.com/iam\" target=\"_blank\">Amazon Web Services Identity and Access Management</a> (IAM) actions in order to manage packages: <code>es:CreatePackage</code>, <code>es:DeletePackage</code>, <code>es:AssociatePackage</code>, and <code>es:DissociatePackage</code>. The user also needs permissions on the Amazon Simple Storage Service (Amazon S3) bucket path or object where the custom package resides. Grant all permission within IAM, not in the domain access policy. This allows for better management of permissions because any change in permissions can be separate from the domain and allows the user to perform the same action across multiple OpenSearch Service domains (if needed).</p>\n<h4><a id=\"Set_up_the_custom_packages_33\"></a><strong>Set up the custom packages</strong></h4>\n<p>To set up the solution, complete the following steps:</p>\n<ol>\n<li>On the Amazon S3 console, create a bucket to hold the custom packages.</li>\n<li>Upload the files with the stopwords and synonyms to this bucket. For this post, the file contents are as follows:</li>\n</ol>\n<p>a. synonyms.txt:</p>\n<pre><code class=\"lang-\"> pasta, penne, ravioli \n ice cream, gelato, frozen custard\n danish, croissant, pastry, bread\n</code></pre>\n<p>b. stopwords.txt:</p>\n<pre><code class=\"lang-\"> the\n a\n an\n of\n</code></pre>\n<p>The following screenshot shows the uploaded files:</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/3e99d3b7310a469d9c275058527ac57b_image.png\" alt=\"image.png\" /></p>\n<p>Now we import our packages and associate them with a domain.</p>\n<ol start=\"3\">\n<li>On the OpenSearch Service console, choose Packages in the navigation pane.</li>\n</ol>\n<p><img src=\"https://dev-media.amazoncloud.cn/9d79393b8afe4e2b99d51bce22983e59_image.png\" alt=\"image.png\" /></p>\n<ol start=\"4\">\n<li>Choose Import package.</li>\n</ol>\n<p><img src=\"https://dev-media.amazoncloud.cn/3b34f1817358407585f29ff9203b3590_image.png\" alt=\"image.png\" /></p>\n<ol start=\"5\">\n<li>Enter a name for your package (for the synonym package, we use <code>my-custom-synonym-package</code>) and optional description.</li>\n<li>For <strong>Package source</strong>, enter the S3 location where synonyms.txt is stored.</li>\n<li>Choose <strong>Submit</strong>.</li>\n</ol>\n<p><img src=\"https://dev-media.amazoncloud.cn/66be6473a58d4dde863b85ad221cb990_image.png\" alt=\"image.png\" /></p>\n<ol start=\"8\">\n<li>Repeat these steps to create a package with stopwords.txt.</li>\n<li>Choose your synonym package when its status shows as <code>Available</code>.</li>\n</ol>\n<p><img src=\"https://dev-media.amazoncloud.cn/ece162b529be444cab37a9afd61691b7_image.png\" alt=\"image.png\" /></p>\n<ol start=\"10\">\n<li>Choose <strong>Associate to a domain</strong>.</li>\n</ol>\n<p><img src=\"https://dev-media.amazoncloud.cn/d0f6114485cf4439a433e9f7534088d3_image.png\" alt=\"image.png\" /></p>\n<ol start=\"11\">\n<li>Select your OpenSearch Service domain, then choose <strong>Associate</strong>.</li>\n</ol>\n<p><img src=\"https://dev-media.amazoncloud.cn/fe1b0acfa4774585b0880df6f391fa53_image.png\" alt=\"image.png\" /></p>\n<ol start=\"12\">\n<li>Repeat these steps to associate your OpenSearch Service domain to the stopwords package.</li>\n<li>When the packages are available, note their IDs.</li>\n</ol>\n<p>You use <code>analyzers/id</code> as the file path in your requests to OpenSearch.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/e9568123df9f493092c0474506411b45_image.png\" alt=\"image.png\" /></p>\n<h4><a id=\"Use_the_custom_packages_with_your_data_98\"></a><strong>Use the custom packages with your data</strong></h4>\n<p>After you associate a file with a domain, you can use it in parameters such as <code>synonyms_path</code> and <code>stopwords_path</code> when you create tokenizers and token filters. For more information, see OpenSearch Service.</p>\n<p>You can create a new index (<code>my-index-test</code>) using the following snippet in the OpenSearch Service domain and specify the <code>Analyzers/id</code> values for the synonyms and stopwords packages.</p>\n<ol>\n<li>Open OpenSearch Dashboards.</li>\n<li>On the <strong>Home</strong> menu, choose <strong>Dev Tools</strong>.</li>\n</ol>\n<p><img src=\"https://dev-media.amazoncloud.cn/cb77ee0e21fa4ba89b0437db231ee8ac_image.png\" alt=\"image.png\" /></p>\n<ol start=\"3\">\n<li>Enter the following code in the left pane:</li>\n</ol>\n<pre><code class=\"lang-\">PUT my-index-test\n{\n &quot;settings&quot;: {\n &quot;index&quot;: {\n &quot;analysis&quot;: {\n &quot;analyzer&quot;: {\n &quot;my_analyzer&quot;: {\n &quot;type&quot;: &quot;custom&quot;,\n &quot;tokenizer&quot;: &quot;standard&quot;,\n &quot;filter&quot;: [&quot;my_stop_filter&quot; , &quot;my_synonym_filter&quot;]\n }\n },\n &quot;filter&quot;: {\n &quot;my_stop_filter&quot;: {\n &quot;type&quot;: &quot;stop&quot;,\n &quot;stopwords_path&quot;: &quot;analyzers/Fxxxxxxxxx&quot;,\n &quot;updateable&quot;: true\n },\n &quot;my_synonym_filter&quot;: {\n &quot;type&quot;: &quot;synonym&quot;,\n &quot;synonyms_path&quot;: &quot;analyzers/Fxxxxxxxxx&quot;,\n &quot;updateable&quot;: true\n }\n \n }\n }\n }\n },\n &quot;mappings&quot;: {\n &quot;properties&quot;: {\n &quot;description&quot;: {\n &quot;type&quot;: &quot;text&quot;,\n &quot;analyzer&quot;: &quot;standard&quot;,\n &quot;search_analyzer&quot;: &quot;my_analyzer&quot;\n }\n }\n }\n}\n</code></pre>\n<ol start=\"4\">\n<li>Choose the play sign to send the request to create the index with our custom synonyms and stopwords.</li>\n</ol>\n<p><img src=\"https://dev-media.amazoncloud.cn/536d8f7fbb884662b5324c9b4502652c_image.png\" alt=\"image.png\" /></p>\n<p>The following screenshot shows our results.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/84875034758a41a582932e3c253d9bc7_image.png\" alt=\"image.png\" /></p>\n<p>This request creates a custom analyzer for my index that uses the standard tokenizer and a synonym and stop token filter. This request also adds a text field (<code>description</code>) to the mapping and tells OpenSearch to use the new analyzer as its search analyzer. It still uses the standard analyzer as its index analyzer.</p>\n<p>Note the line <code>&quot;updateable&quot;: true</code> in the token filter. This field only applies to search analyzers, not index analyzers, and is critical if you later want to <a href=\"https://docs.aws.amazon.com/opensearch-service/latest/developerguide/custom-packages.html#custom-packages-updating\" target=\"_blank\">update the search analyzer</a> automatically.</p>\n<p>Let’s start by adding some sample data to the index <code>my-index-test</code>:</p>\n<pre><code class=\"lang-\">POST _bulk\n{ &quot;index&quot;: { &quot;_index&quot;: &quot;my-index-test&quot;, &quot;_id&quot;: &quot;1&quot; } }\n{ &quot;description&quot;: &quot;pasta&quot; }\n{ &quot;index&quot;: { &quot;_index&quot;: &quot;my-index-test&quot;, &quot;_id&quot;: &quot;2&quot; } }\n{ &quot;description&quot;: &quot;the bread&quot; }\n{ &quot;index&quot;: { &quot;_index&quot;: &quot;my-index-test&quot;, &quot;_id&quot;: &quot;3&quot; } }\n{ &quot;description&quot;: &quot;ice cream&quot; }\n{ &quot;index&quot;: { &quot;_index&quot;: &quot;my-index-test&quot;, &quot;_id&quot;: &quot;4&quot; } }\n{ &quot;description&quot;: &quot;croissant&quot; }\n</code></pre>\n<p><img src=\"https://dev-media.amazoncloud.cn/a597cd3813a74a3081e87dbeeb7223c1_image.png\" alt=\"image.png\" /></p>\n<p>Now If you search for the words you specified in the synonyms.txt file, you get the required results. Note that my test index only has <code>pasta</code> in the indexed data, but because I specified “ravioli” as a synonym for “pasta” in my associated package, I get the results for all documents that have the word “pasta” when I search for “ravioli.”</p>\n<pre><code class=\"lang-\">GET my-index-test/_search\n{\n &quot;query&quot;: {\n &quot;match&quot;: {\n &quot;description&quot;: &quot;ravioli&quot;\n }\n }\n}\n</code></pre>\n<p><img src=\"https://dev-media.amazoncloud.cn/1df1eaf48c804e00952cedaf544d8330_image.png\" alt=\"image.png\" /></p>\n<p>Similarly, you can use the stopwords feature to specify common words that can be filtered out while showing search results and don’t impact the relevance much while returning search query results.</p>\n<h4><a id=\"Hot_reload_199\"></a><strong>Hot reload</strong></h4>\n<p>Now let’s say you want to add another synonym (“spaghetti”) for “pasta.”</p>\n<ol>\n<li>The first step is to update the synonyms.txt file as follows and upload this updated file to your S3 bucket:</li>\n</ol>\n<pre><code class=\"lang-\">pasta , penne , ravioli, spaghetti\nice cream, gelato, frozen custard\ndanish, croissant, pastry , bread\n</code></pre>\n<p><img src=\"https://dev-media.amazoncloud.cn/f70952d6fb00400faf2209d1f0f88785_image.png\" alt=\"image.png\" /></p>\n<p>Uploading a new version of a package to Amazon S3 doesn’t automatically update the package on OpenSearch Service. OpenSearch Service stores its own copy of the file, so if you upload a new version to Amazon S3, you must manually update it in OpenSearch Service.</p>\n<p>If you try to run the search query against the index for the term “spaghetti” at this point, you don’t get any results:</p>\n<pre><code class=\"lang-\">GET my-index-test/_search\n{\n &quot;query&quot;: {\n &quot;match&quot;: {\n &quot;description&quot;: &quot;spaghetti&quot;\n }\n }\n}\n</code></pre>\n<p><img src=\"https://dev-media.amazoncloud.cn/51a15907f11944abb45d5b3eafcc6d05_image.png\" alt=\"image.png\" /></p>\n<p>After the file is modified in Amazon S3, update the package in OpenSearch Service, then apply the update. To do this, perform the following steps:</p>\n<ol start=\"2\">\n<li>On the OpenSearch Service console, choose <strong>Packages</strong>.</li>\n<li>Choose the package you created for custom synonyms and choose <strong>Update</strong>.</li>\n</ol>\n<p><img src=\"https://dev-media.amazoncloud.cn/8f953b6b2921419a8dab87629aa4536b_image.png\" alt=\"image.png\" /></p>\n<ol start=\"4\">\n<li>Provide the S3 path to the file, then choose <strong>Update package</strong>.</li>\n</ol>\n<p><img src=\"https://dev-media.amazoncloud.cn/a66ff44cfef646d696ee01d2415b2c59_image.png\" alt=\"image.png\" /></p>\n<ol start=\"5\">\n<li>Enter a description and choose <strong>Update package</strong>.</li>\n</ol>\n<p><img src=\"https://dev-media.amazoncloud.cn/afdfed44c7b944368122c0f37f411b73_image.png\" alt=\"image.png\" /></p>\n<p>You return to the Packages page.</p>\n<ol start=\"6\">\n<li>When the package status shows as <code>Available</code>, choose it and wait for the associated domain to show as updated.</li>\n<li>Select the domain and choose <strong>Apply update</strong>.</li>\n</ol>\n<p><img src=\"https://dev-media.amazoncloud.cn/4f2bd698a2de4742a62f32fb859b7d85_image.png\" alt=\"image.png\" /></p>\n<ol start=\"8\">\n<li>Choose <strong>Apply update</strong> again to confirm.</li>\n</ol>\n<p><img src=\"https://dev-media.amazoncloud.cn/b2633cd7509846b49d85325dc5aea8f6_image.png\" alt=\"image.png\" /></p>\n<p>Wait for the association status to change to <code>Active</code> to confirm that the package version is also updated.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/0a3e2798f7b049589ae2e483c10c60f6_image.png\" alt=\"image.png\" /></p>\n<p>If your domain runs Elasticsearch 7.7 or earlier, uses index analyzers, or doesn’t use the updateable field, and if you want to add some additional synonyms at a later time, you have to reindex your data with the new dictionary file. Previously, on Amazon Elasticsearch Service, these analyzers could only process data as it was indexed.</p>\n<p>If your domains runs OpenSearch Service or Amazon Elasticsearch Service 7.8 or later and only uses search analyzers with the <a href=\"https://docs.aws.amazon.com/opensearch-service/latest/developerguide/custom-packages.html#custom-packages-using\" target=\"_blank\">updateable</a> field set to <code>true</code>, you don’t need to take any further action. OpenSearch Service automatically updates your indexes using the <a href=\"https://opensearch.org/docs/im-plugin/refresh-analyzer/index/\" target=\"_blank\">_plugins/_refresh_search_analyzers API</a>. This allows for refresh of search analyzers in real time without you needing to close and reopen the index.</p>\n<p>This feature called hot reload provides the ability to reload dictionary files without reindexing your data. With the new hot reload capability, you can call analyzers at search time, and your dictionary files augment the query. This feature also lets you version your dictionary files in OpenSearch Service and update them on your domains, without having to reindex your data.</p>\n<p>Because the domain used in this demonstration runs OpenSearch Service 1.2, you can utilize this hot reload feature and without re-indexing of any data. Simply run a search query for the newly added synonym (“spaghetti”) and get all resultant documents that are synonymous to it:</p>\n<pre><code class=\"lang-\">GET my-index-test/_search\n{\n &quot;query&quot;: {\n &quot;match&quot;: {\n &quot;description&quot;: &quot;spaghetti&quot;\n }\n }\n}\n</code></pre>\n<p><img src=\"https://dev-media.amazoncloud.cn/661bc44f6aec4f5093279c4192e071c6_image.png\" alt=\"image.png\" /></p>\n<h4><a id=\"Conclusion_284\"></a><strong>Conclusion</strong></h4>\n<p>In this post, we showed how easy it is to set up synonyms in OpenSearch Service so you can find the relevant documents that match a synonym for a word, even when the specific word isn’t used as search term. We also demonstrated how to add and update existing synonym dictionaries and load those files to reflect the changes.</p>\n<p>If you have feedback about this post, submit your comments in the comments section. You can also start a new thread on the <a href=\"https://forums.aws.amazon.com/forum.jspa?forumID=200\" target=\"_blank\">OpenSearch Service forum</a> or <a href=\"https://console.aws.amazon.com/support/home\" target=\"_blank\">contact Amazon Web Services Support</a> with questions.</p>\n<h5><a id=\"About_the_Authors_292\"></a><strong>About the Authors</strong></h5>\n<p><img src=\"https://dev-media.amazoncloud.cn/9eef5cb349fa4a129a861415fc72e9b4_image.png\" alt=\"image.png\" /></p>\n<p><strong>Sonam Chaudhary</strong> is a Solutions Architect and Big Data and Analytics Specialist at Amazon Web Services. She works with customers to build scalable, highly available, cost-effective, and secure solutions in the Amazon Web Services Cloud. In her free time, she likes traveling with her husband, shopping, and watching movies.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/7e13269f6e234f5c9d5988b43c620b26_image.png\" alt=\"image.png\" /></p>\n<p><strong>Prashant Agrawal</strong> is a Search Specialist Solutions Architect with OpenSearch Service. He works closely with customers to help them migrate their workloads to the cloud and helps existing customers fine-tune their clusters to achieve better performance and save on cost. Before joining Amazon Web Services, he helped various customers use OpenSearch and Elasticsearch for their search and log analytics use cases. When not working, you can find him traveling and exploring new places. In short, he likes doing Eat → Travel → Repeat.</p>\n"}
亚马逊云科技解决方案 基于行业客户应用场景及技术领域的解决方案
联系亚马逊云科技专家
亚马逊云科技解决方案
基于行业客户应用场景及技术领域的解决方案
联系专家
0
目录
关闭