Simplify analytics on Amazon Redshift using PIVOT and UNPIVOT

海外精选
海外精选的内容汇集了全球优质的亚马逊云科技相关技术内容。同时,内容中提到的“AWS” 是 “Amazon Web Services” 的缩写,在此网站不作为商标展示。
0
0
{"value":"[Amazon Redshift](http://aws.amazon.com/redshift) is a fast, fully managed cloud data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing business intelligence (BI) tools.\n\nMany customers look to build their data warehouse on [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail), and they have many requirements where they want to convert data from row level to column level and vice versa. [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) now natively supports PIVOT and UNPIVOT SQL operators with built-in optimizations that you can use for data modeling, data analysis, and data presentation. You can apply PIVOT and UNPIVOT to tables, sub-queries, and common table expressions (CTEs). PIVOT supports the COUNT, SUM, MIN, MAX, and AVG aggregate functions.\n\nYou can use PIVOT tables as a statistics tool that summarizes and reorganizes selected columns and rows of data from a dataset. The following are a few scenarios where this can be useful:\n\n- Group the values by at least one column\n- Convert the unique values of a selected column into new column names\n- Use in combination with aggregate functions to derive complex reports\n- Filter specific values in rows and convert them into columns or vice versa\n- Use these operators to generate a multidimensional reporting\n\nIn this post, we discuss the benefits of PIVOT and UNPIVOT, and how you can use them to simplify your analytics in [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail).\n\n\n#### **PIVOT overview**\n\n\nThe following code illustrates the PIVOT syntax:\n\n```\\nSELECT … \\nFROM \\n (<get_source_data>) \\n AS <alias_source_query> \\nPIVOT \\n( \\n <agg_func>(<agg_col>) \\nFOR \\n[<pivot_col>] \\n IN ( [pivot_value_first], [pivot_value_second], \\n ... [pivot_value_last]) \\n) AS <alias_pivot>\\n<optional ORDER BY clause>;\\n```\n\nThe syntax contains the following parameters:\n\n1. **<get_source_data>** – The SELECT query that gets the data from the source table\\n2. **<alias_source_query>** – The alias for the source query that gets the data\\n3. **<agg_func>** – The aggregate function to apply\\n4. **<agg_col>** – The column to aggregate\\n5. **<pivot_col>** – The column whose value is pivoted\\n6. **<pivot_value_n>** – A list of pivot column values separated by commas\\n7. **<alias_pivot>** – The alias for the pivot table\\n8. **<optional ORDER BY clause>** – An optional parameter to apply an ORDER BY clause on the result set\\n\\nThe following diagram illustrates how PIVOT works.\\n\\n![image.png](https://dev-media.amazoncloud.cn/8f2cc0ff15ee4bc697d225a45ac95b67_image.png)\\n\\n**PIVOT instead of CASE statements**\\n\\nLet’s look at an example of analyzing data from a different perspective than how it’s stored in the table. In the following example, book sales data is stored by year for each book. We want to look at the **book_sales** dataset by year and analyze if there were any books sold or not, and if sold, how many books were sold for each title. The following screenshot shows our query.\\n\\n![image.png](https://dev-media.amazoncloud.cn/fc12762b56554f3f8bdbf779d0203701_image.png)\\n\\nThe following screenshot shows our output.\\n\\n![image.png](https://dev-media.amazoncloud.cn/e5ba402fd4bf4b9f8ee00e21fbb448bc_image.png)\\n\\nPreviously, you had to derive your desired results set using a CASE statement. This requires you to add an individual CASE statement with the column name for each title, as shown in the following code:\\n\\n```\\nSELECT year,\\nMAX (CASE WHEN bookname = 'LOTR' THEN sales ELSE NULL END) LOTR,\\nMAX (CASE WHEN bookname = 'GOT' THEN sales ELSE NULL END) GOT,\\nMAX (CASE WHEN bookname = 'Harry Potter' THEN sales else NULL\\nEND) \\"Harry Potter\\",\\nMAX (CASE WHEN bookname = 'Sherlock' THEN sales ELSE NULL END)\\nsherlock\\nFROM book_sales GROUP BY year order by year;\\n```\\n\\n![image.png](https://dev-media.amazoncloud.cn/8290c98dc6de40c4909946201ea7e5de_image.png)\\n\\nWith the out-of-the-box PIVOT operator, you can use a simpler SQL statement to achieve the same results:\\n\\n```\\nSELECT *\\nFROM\\n(\\n SELECT bookname, year, sales\\n FROM book_sales\\n) AS d\\nPIVOT\\n(\\n MAX(sales)\\n FOR bookname IN ('LOTR', 'GOT', 'Harry Potter', 'Sherlock')\\n) AS piv\\norder by year;\\n```\\n\\n![image.png](https://dev-media.amazoncloud.cn/955af1a08ffe4376a4fc4b4701831e1f_image.png)\\n\\n\\n#### **UNPIVOT overview**\\n\\n\\nThe following code illustrates the UNPIVOT syntax:\\n\\n```\\nSELECT ...\\nFROM \\n (<get_source_data>) \\n AS <alias_source_query> \\nUNPIVOT <optional INCLUDE NULLS>\\n( \\n <value_col>\\nFOR \\n<name_col> \\n IN (column_name_1, column_name_2 ..... column_name_n) \\n) AS <alias_unpivot>\\n<optional ORDER BY clause>; \\n```\\n\\nThe code uses the following parameters:\\n\\n- **<get_source_data>** – The SELECT query that gets the data from the source table.\\n- **<alias_source_query>** – The alias for the source query that gets the data.\\n- **<optional INCLUDE NULLS>** – An optional parameter to include NULL values in the result set. By default, NULLs in input columns aren’t inserted as result rows.\\n- **<value_col>** – The name assigned to the generated column that contains the row values from the column list.\\n- **<name_col>** – The name assigned to the generated column that contains the column names from the column list.\\n- **<column_name_n>** – The column names from the source table or subquery to populate ```value_col``` and ```name_col```.\\n- **<alias_unpivot>** – The alias for the unpivot table.\\n- **<optional ORDER BY clause>** – An optional parameter to apply an ORDER BY clause on the result set.\\n\\nThe following diagram illustrates how UNPIVOT works.\\n\\n![image.png](https://dev-media.amazoncloud.cn/a1cdd24ed1e64992bf98e4b09b3763bb_image.png)\\n\\n\\n#### **UNPIVOT instead of UNION ALL queries**\\n\\n\\nLet’s look at the following example query with ```book_sales_pivot```.\\n\\n![image.png](https://dev-media.amazoncloud.cn/01b92164024b4118b004a3781ccb1716_image.png)\\n\\nWe get the following output.\\n\\n![image.png](https://dev-media.amazoncloud.cn/cde3250ec4c5438b99264354ca4d5591_image.png)\\n\\nPreviously, you had to derive this result set using UNION ALL, which resulted in a long and complex query form, as shown in the following code:\\n\\n```\\nselect * from\\n(SELECT year, 'lotr' AS book, LOTR AS sales FROM (SELECT * FROM book_sales_pivot)\\nUNION ALL\\nSELECT year, 'got' AS book, GOT AS sales FROM (SELECT * FROM book_sales_pivot)\\nUNION ALL\\nSELECT year, 'harry potter' AS book, \\"Harry Potter\\" AS sales FROM (SELECT * FROM book_sales_pivot)\\nUNION ALL\\nSELECT year, 'sherlock' AS book, \\"Sherlock\\" AS sales FROM (SELECT * FROM book_sales_pivot)\\n)\\norder by year;\\n```\\n\\n![image.png](https://dev-media.amazoncloud.cn/9e31e8cd970e41b18c5b2962aab4eaea_image.png)\\n\\nWith UNPIVOT, you can use the following simplified query:\\n\\n```\\nselect * from book_sales_pivot UNPIVOT INCLUDE NULLS\\n(sales for book in (\\"LOTR\\", \\"GOT\\", \\"Harry Potter\\", \\"Sherlock\\"))\\norder by year;\\n```\\n\\n![image.png](https://dev-media.amazoncloud.cn/3f1adf875d914cc192b0ca33c190e52f_image.png)\\n\\nUNPIVOT is straightforward compared to UNION ALL. You can further clean this output by excluding NULL values from the result set. For example, you can exclude book titles from the result set if there were no sales in a year:\\n\\n```\\nselect * from book_PIVOT UNPIVOT\\n(sales for book in (\\"LOTR\\", \\"GOT\\", \\"Harry Potter\\", \\"Sherlock\\"))\\norder by year;\\n```\\n\\n![image.png](https://dev-media.amazoncloud.cn/81678f61ae374642a780d16af9d0b4bd_image.png)\\n\\nBy default, NULL values in the input column are skipped and don’t yield a result row.\\n\\nNow that we understand the basic interface and usability, let’s dive into a few complex use cases.\\n\\n\\n#### **Dynamic PIVOT tables using stored procedures**\\n\\n\\nThe query of PIVOT is static, meaning that you have to enter a list of PIVOT column names manually. In some scenarios, you may not want to manually use your PIVOT values because your data keeps changing, and it gets difficult to maintain the list of values and update the PIVOT query manually.\\n\\nTo handle these scenarios, you can take advantage of the dynamic PIVOT stored procedure:\\n\\n```\\n/*\\n non_pivot_cols : Text list of columns to be added to the SELECT clause\\n table_name : Schema qualified name of table to be queried\\n agg_func : Name of the aggregate function to apply\\n agg_col : Name of the column to be aggregated\\n pivot_col : Name of the column whose value will be pivoted\\n result_set : Name of cursor used for output \\n */ \\n\\nCREATE OR REPLACE PROCEDURE public.sp_dynamicPIVOT\\n(\\nnon_pivot_cols IN VARCHAR(MAX),\\ntable_name IN VARCHAR(MAX),\\nagg_func IN VARCHAR(32),\\nagg_col IN VARCHAR(MAX),\\npivot_col IN VARCHAR(100),\\nresult_set INOUT REFCURSOR )\\nAS \$\$\\nDECLARE\\nsql VARCHAR(MAX) := '';\\nresult_t VARCHAR(MAX) := '';\\nPIVOT_sql VARCHAR(MAX);\\ncnt INTEGER := 1;\\nno_of_parts INTEGER := 0;\\nitem_for_col character varying := '';\\nitem_pivot_cols character varying := '';\\nBEGIN\\n\\nsql := 'SELECT listagg (distinct ' || pivot_col || ', '','') within group (order by ' || pivot_col || ') from ' || table_name || ';';\\n\\nEXECUTE sql ||' ;' INTO result_t;\\n\\n\\nno_of_parts := (select REGEXP_COUNT ( result_t , ',' ));\\n\\n\\n<<simple_loop_exit_continue>>\\n LOOP\\n item_for_col := item_for_col + '''' + (select split_part(\\"result_t\\",',',cnt)) +''''; \\n item_pivot_cols := item_pivot_cols + '\\"' + (select split_part(\\"result_t\\",',',cnt)) +'\\"'; \\n cnt = cnt + 1;\\n IF (cnt < no_of_parts + 2) THEN\\n item_for_col := item_for_col + ',';\\n item_pivot_cols := item_pivot_cols + ',';\\n END IF;\\n EXIT simple_loop_exit_continue WHEN (cnt >= no_of_parts + 2);\\n END LOOP;\\n\\n\\nPIVOT_sql := 'SELECT ' || non_PIVOT_cols || ',' || item_pivot_cols || ' from ( select * from ' || table_name || ' ) as src_data PIVOT ( ' || agg_func || '(' || agg_col || ') FOR ' || pivot_col || ' IN (' || item_for_col || ' )) as PIV order by ' || non_PIVOT_cols || ';';\\n\\n\\n-- Open the cursor and execute the SQL\\nOPEN result_set FOR EXECUTE PIVOT_sql;\\n\\nEND;\\n\$\$ LANGUAGE plpgsql;\\n\\n\\nExample:\\nBEGIN;\\nCALL public.sp_dynamicPIVOT ('year','public.book_sales','MAX','sales','bookname', 'PIVOT_result');\\nFETCH ALL FROM PIVOT_result; CLOSE PIVOT_result;\\nEND;\\n```\\n\\n\\n#### **PIVOT example using CTEs**\\n\\n\\nYou can use PIVOT as part of a CTE (Common Table Expression). See the following example code:\\n\\n```\\nwith dataset1 as\\n(Select bookname,sales from public.book_sales)\\nselect * from dataset1 PIVOT (\\n sum(sales)\\n FOR bookname IN ('LOTR', 'GOT', 'Harry Potter', 'Sherlock')\\n);\\n```\\n\\n![image.png](https://dev-media.amazoncloud.cn/76fc20f9c30946f19185a23e0aeccfc7_image.png)\\n\\n\\n\\n#### **Multiple aggregations for PIVOT**\\n\\n\\nThe following code illustrates multiple aggregations for PIVOT:\\n\\n```\\nWITH dataset1 AS\\n(\\n SELECT 1 AS \\"rownum\\",\\n bookname,\\n sales\\n FROM PUBLIC.book_sales)\\nSELECT *\\nFROM (\\n SELECT rownum,\\"LOTR\\" as avg_sales_lotr,\\"GOT\\" as avg_sales_got,\\"Harry Potter\\" as avg_sales_harrypotter,\\"Sherlock\\" as avg_sales_sherlock\\n FROM dataset1 PIVOT (avg(sales) FOR bookname IN ('LOTR','GOT','Harry Potter','Sherlock')) AS avg_sales) a\\nJOIN\\n (\\n SELECT rownum, \\"LOTR\\" as sum_sales_lotr,\\"GOT\\" as sum_sales_got,\\"Harry Potter\\" as sum_sales_harrypotter,\\"Sherlock\\" as sum_sales_sherlock\\n FROM dataset1 PIVOT (sum(sales) FOR bookname IN ('LOTR',\\n 'GOT', 'Harry Potter', 'Sherlock')) AS sum_sales) b\\nusing (rownum);\\n```\\n\\n![image.png](https://dev-media.amazoncloud.cn/60f09df2a7094f898259ac954676575e_image.png)\\n\\n\\n#### **Summary**\\n\\n\\nAlthough PIVOT and UNPIVOT aren’t entirely new paradigms of SQL language, the new native support for these operators in Amazon Redshift can help you achieve many robust use cases without the hassle of using alternate operators. In this post, we explored a few ways in which the new operators may come in handy.\\n\\nAdapt [PIVOT and UNPIVOT](https://docs.aws.amazon.com/redshift/latest/dg/r_FROM_clause-pivot-unpivot-examples.html) into your workstreams now and work with us as we evolve the feature, incorporating more complex option sets. Please feel free to [reach out](https://aws.amazon.com/contact-us/) to us if you need further help to achieve your custom use cases.\\n\\n\\n#### **About the authors**\\n\\n\\n![image.png](https://dev-media.amazoncloud.cn/30e93f3a15264ae2b578404861f6a95b_image.png)\\n\\n**Ashish Agrawal** is currently Sr. Technical Product Manager with Amazon Redshift building cloud-based data warehouse and analytics cloud service. Ashish has over 24 years of experience in IT. Ashish has expertise in data warehouse, data lake, Platform as a Service. Ashish is speaker at worldwide technical conferences.\\n\\n![image.png](https://dev-media.amazoncloud.cn/838af050ec234a08968dabc68edd129b_image.png)\\n\\n**Sai Teja Boddapat**i is a Database Engineer based out of Seattle. He works on solving complex database problems to contribute to building the most user friendly data warehouse available. In his spare time, he loves travelling, playing games and watching movies & documentaries.\\n\\n![image.png](https://dev-media.amazoncloud.cn/dd0f409d85994c2d8974dea5b41fc7b5_image.png)\\n\\n**Maneesh Sharma** is a Senior Database Engineer at Amazon Web Services with more than a decade of experience designing and implementing large-scale data warehouse and analytics solutions. He collaborates with various Amazon Redshift Partners and customers to drive better integration.\\n\\n![image.png](https://dev-media.amazoncloud.cn/1a0c9bfffaac45d2a35956efc3571376_image.png)\\n\\n**Eesha Kumar** is an Analytics Solutions Architect with Amazon Web Services . He works with customers to realize business value of data by helping them building solutions leveraging Amazon Web Services platform and tools.","render":"<p><a href=\\"http://aws.amazon.com/redshift\\" target=\\"_blank\\">Amazon Redshift</a> is a fast, fully managed cloud data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing business intelligence (BI) tools.</p>\\n<p>Many customers look to build their data warehouse on Amazon Redshift, and they have many requirements where they want to convert data from row level to column level and vice versa. Amazon Redshift now natively supports PIVOT and UNPIVOT SQL operators with built-in optimizations that you can use for data modeling, data analysis, and data presentation. You can apply PIVOT and UNPIVOT to tables, sub-queries, and common table expressions (CTEs). PIVOT supports the COUNT, SUM, MIN, MAX, and AVG aggregate functions.</p>\n<p>You can use PIVOT tables as a statistics tool that summarizes and reorganizes selected columns and rows of data from a dataset. The following are a few scenarios where this can be useful:</p>\n<ul>\\n<li>Group the values by at least one column</li>\n<li>Convert the unique values of a selected column into new column names</li>\n<li>Use in combination with aggregate functions to derive complex reports</li>\n<li>Filter specific values in rows and convert them into columns or vice versa</li>\n<li>Use these operators to generate a multidimensional reporting</li>\n</ul>\\n<p>In this post, we discuss the benefits of PIVOT and UNPIVOT, and how you can use them to simplify your analytics in Amazon Redshift.</p>\n<h4><a id=\\"PIVOT_overview_15\\"></a><strong>PIVOT overview</strong></h4>\\n<p>The following code illustrates the PIVOT syntax:</p>\n<pre><code class=\\"lang-\\">SELECT … \\nFROM \\n (&lt;get_source_data&gt;) \\n AS &lt;alias_source_query&gt; \\nPIVOT \\n( \\n &lt;agg_func&gt;(&lt;agg_col&gt;) \\nFOR \\n[&lt;pivot_col&gt;] \\n IN ( [pivot_value_first], [pivot_value_second], \\n ... [pivot_value_last]) \\n) AS &lt;alias_pivot&gt;\\n&lt;optional ORDER BY clause&gt;;\\n</code></pre>\\n<p>The syntax contains the following parameters:</p>\n<ol>\\n<li><strong>&lt;get_source_data&gt;</strong> – The SELECT query that gets the data from the source table</li>\\n<li><strong>&lt;alias_source_query&gt;</strong> – The alias for the source query that gets the data</li>\\n<li><strong>&lt;agg_func&gt;</strong> – The aggregate function to apply</li>\\n<li><strong>&lt;agg_col&gt;</strong> – The column to aggregate</li>\\n<li><strong>&lt;pivot_col&gt;</strong> – The column whose value is pivoted</li>\\n<li><strong>&lt;pivot_value_n&gt;</strong> – A list of pivot column values separated by commas</li>\\n<li><strong>&lt;alias_pivot&gt;</strong> – The alias for the pivot table</li>\\n<li><strong>&lt;optional ORDER BY clause&gt;</strong> – An optional parameter to apply an ORDER BY clause on the result set</li>\\n</ol>\n<p>The following diagram illustrates how PIVOT works.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/8f2cc0ff15ee4bc697d225a45ac95b67_image.png\\" alt=\\"image.png\\" /></p>\n<p><strong>PIVOT instead of CASE statements</strong></p>\\n<p>Let’s look at an example of analyzing data from a different perspective than how it’s stored in the table. In the following example, book sales data is stored by year for each book. We want to look at the <strong>book_sales</strong> dataset by year and analyze if there were any books sold or not, and if sold, how many books were sold for each title. The following screenshot shows our query.</p>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/fc12762b56554f3f8bdbf779d0203701_image.png\\" alt=\\"image.png\\" /></p>\n<p>The following screenshot shows our output.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/e5ba402fd4bf4b9f8ee00e21fbb448bc_image.png\\" alt=\\"image.png\\" /></p>\n<p>Previously, you had to derive your desired results set using a CASE statement. This requires you to add an individual CASE statement with the column name for each title, as shown in the following code:</p>\n<pre><code class=\\"lang-\\">SELECT year,\\nMAX (CASE WHEN bookname = 'LOTR' THEN sales ELSE NULL END) LOTR,\\nMAX (CASE WHEN bookname = 'GOT' THEN sales ELSE NULL END) GOT,\\nMAX (CASE WHEN bookname = 'Harry Potter' THEN sales else NULL\\nEND) &quot;Harry Potter&quot;,\\nMAX (CASE WHEN bookname = 'Sherlock' THEN sales ELSE NULL END)\\nsherlock\\nFROM book_sales GROUP BY year order by year;\\n</code></pre>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/8290c98dc6de40c4909946201ea7e5de_image.png\\" alt=\\"image.png\\" /></p>\n<p>With the out-of-the-box PIVOT operator, you can use a simpler SQL statement to achieve the same results:</p>\n<pre><code class=\\"lang-\\">SELECT *\\nFROM\\n(\\n SELECT bookname, year, sales\\n FROM book_sales\\n) AS d\\nPIVOT\\n(\\n MAX(sales)\\n FOR bookname IN ('LOTR', 'GOT', 'Harry Potter', 'Sherlock')\\n) AS piv\\norder by year;\\n</code></pre>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/955af1a08ffe4376a4fc4b4701831e1f_image.png\\" alt=\\"image.png\\" /></p>\n<h4><a id=\\"UNPIVOT_overview_96\\"></a><strong>UNPIVOT overview</strong></h4>\\n<p>The following code illustrates the UNPIVOT syntax:</p>\n<pre><code class=\\"lang-\\">SELECT ...\\nFROM \\n (&lt;get_source_data&gt;) \\n AS &lt;alias_source_query&gt; \\nUNPIVOT &lt;optional INCLUDE NULLS&gt;\\n( \\n &lt;value_col&gt;\\nFOR \\n&lt;name_col&gt; \\n IN (column_name_1, column_name_2 ..... column_name_n) \\n) AS &lt;alias_unpivot&gt;\\n&lt;optional ORDER BY clause&gt;; \\n</code></pre>\\n<p>The code uses the following parameters:</p>\n<ul>\\n<li><strong>&lt;get_source_data&gt;</strong> – The SELECT query that gets the data from the source table.</li>\\n<li><strong>&lt;alias_source_query&gt;</strong> – The alias for the source query that gets the data.</li>\\n<li><strong>&lt;optional INCLUDE NULLS&gt;</strong> – An optional parameter to include NULL values in the result set. By default, NULLs in input columns aren’t inserted as result rows.</li>\\n<li><strong>&lt;value_col&gt;</strong> – The name assigned to the generated column that contains the row values from the column list.</li>\\n<li><strong>&lt;name_col&gt;</strong> – The name assigned to the generated column that contains the column names from the column list.</li>\\n<li><strong>&lt;column_name_n&gt;</strong> – The column names from the source table or subquery to populate <code>value_col</code> and <code>name_col</code>.</li>\\n<li><strong>&lt;alias_unpivot&gt;</strong> – The alias for the unpivot table.</li>\\n<li><strong>&lt;optional ORDER BY clause&gt;</strong> – An optional parameter to apply an ORDER BY clause on the result set.</li>\\n</ul>\n<p>The following diagram illustrates how UNPIVOT works.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/a1cdd24ed1e64992bf98e4b09b3763bb_image.png\\" alt=\\"image.png\\" /></p>\n<h4><a id=\\"UNPIVOT_instead_of_UNION_ALL_queries_132\\"></a><strong>UNPIVOT instead of UNION ALL queries</strong></h4>\\n<p>Let’s look at the following example query with <code>book_sales_pivot</code>.</p>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/01b92164024b4118b004a3781ccb1716_image.png\\" alt=\\"image.png\\" /></p>\n<p>We get the following output.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/cde3250ec4c5438b99264354ca4d5591_image.png\\" alt=\\"image.png\\" /></p>\n<p>Previously, you had to derive this result set using UNION ALL, which resulted in a long and complex query form, as shown in the following code:</p>\n<pre><code class=\\"lang-\\">select * from\\n(SELECT year, 'lotr' AS book, LOTR AS sales FROM (SELECT * FROM book_sales_pivot)\\nUNION ALL\\nSELECT year, 'got' AS book, GOT AS sales FROM (SELECT * FROM book_sales_pivot)\\nUNION ALL\\nSELECT year, 'harry potter' AS book, &quot;Harry Potter&quot; AS sales FROM (SELECT * FROM book_sales_pivot)\\nUNION ALL\\nSELECT year, 'sherlock' AS book, &quot;Sherlock&quot; AS sales FROM (SELECT * FROM book_sales_pivot)\\n)\\norder by year;\\n</code></pre>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/9e31e8cd970e41b18c5b2962aab4eaea_image.png\\" alt=\\"image.png\\" /></p>\n<p>With UNPIVOT, you can use the following simplified query:</p>\n<pre><code class=\\"lang-\\">select * from book_sales_pivot UNPIVOT INCLUDE NULLS\\n(sales for book in (&quot;LOTR&quot;, &quot;GOT&quot;, &quot;Harry Potter&quot;, &quot;Sherlock&quot;))\\norder by year;\\n</code></pre>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/3f1adf875d914cc192b0ca33c190e52f_image.png\\" alt=\\"image.png\\" /></p>\n<p>UNPIVOT is straightforward compared to UNION ALL. You can further clean this output by excluding NULL values from the result set. For example, you can exclude book titles from the result set if there were no sales in a year:</p>\n<pre><code class=\\"lang-\\">select * from book_PIVOT UNPIVOT\\n(sales for book in (&quot;LOTR&quot;, &quot;GOT&quot;, &quot;Harry Potter&quot;, &quot;Sherlock&quot;))\\norder by year;\\n</code></pre>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/81678f61ae374642a780d16af9d0b4bd_image.png\\" alt=\\"image.png\\" /></p>\n<p>By default, NULL values in the input column are skipped and don’t yield a result row.</p>\n<p>Now that we understand the basic interface and usability, let’s dive into a few complex use cases.</p>\n<h4><a id=\\"Dynamic_PIVOT_tables_using_stored_procedures_185\\"></a><strong>Dynamic PIVOT tables using stored procedures</strong></h4>\\n<p>The query of PIVOT is static, meaning that you have to enter a list of PIVOT column names manually. In some scenarios, you may not want to manually use your PIVOT values because your data keeps changing, and it gets difficult to maintain the list of values and update the PIVOT query manually.</p>\n<p>To handle these scenarios, you can take advantage of the dynamic PIVOT stored procedure:</p>\n<pre><code class=\\"lang-\\">/*\\n non_pivot_cols : Text list of columns to be added to the SELECT clause\\n table_name : Schema qualified name of table to be queried\\n agg_func : Name of the aggregate function to apply\\n agg_col : Name of the column to be aggregated\\n pivot_col : Name of the column whose value will be pivoted\\n result_set : Name of cursor used for output \\n */ \\n\\nCREATE OR REPLACE PROCEDURE public.sp_dynamicPIVOT\\n(\\nnon_pivot_cols IN VARCHAR(MAX),\\ntable_name IN VARCHAR(MAX),\\nagg_func IN VARCHAR(32),\\nagg_col IN VARCHAR(MAX),\\npivot_col IN VARCHAR(100),\\nresult_set INOUT REFCURSOR )\\nAS \$\$\\nDECLARE\\nsql VARCHAR(MAX) := '';\\nresult_t VARCHAR(MAX) := '';\\nPIVOT_sql VARCHAR(MAX);\\ncnt INTEGER := 1;\\nno_of_parts INTEGER := 0;\\nitem_for_col character varying := '';\\nitem_pivot_cols character varying := '';\\nBEGIN\\n\\nsql := 'SELECT listagg (distinct ' || pivot_col || ', '','') within group (order by ' || pivot_col || ') from ' || table_name || ';';\\n\\nEXECUTE sql ||' ;' INTO result_t;\\n\\n\\nno_of_parts := (select REGEXP_COUNT ( result_t , ',' ));\\n\\n\\n&lt;&lt;simple_loop_exit_continue&gt;&gt;\\n LOOP\\n item_for_col := item_for_col + '''' + (select split_part(&quot;result_t&quot;,',',cnt)) +''''; \\n item_pivot_cols := item_pivot_cols + '&quot;' + (select split_part(&quot;result_t&quot;,',',cnt)) +'&quot;'; \\n cnt = cnt + 1;\\n IF (cnt &lt; no_of_parts + 2) THEN\\n item_for_col := item_for_col + ',';\\n item_pivot_cols := item_pivot_cols + ',';\\n END IF;\\n EXIT simple_loop_exit_continue WHEN (cnt &gt;= no_of_parts + 2);\\n END LOOP;\\n\\n\\nPIVOT_sql := 'SELECT ' || non_PIVOT_cols || ',' || item_pivot_cols || ' from ( select * from ' || table_name || ' ) as src_data PIVOT ( ' || agg_func || '(' || agg_col || ') FOR ' || pivot_col || ' IN (' || item_for_col || ' )) as PIV order by ' || non_PIVOT_cols || ';';\\n\\n\\n-- Open the cursor and execute the SQL\\nOPEN result_set FOR EXECUTE PIVOT_sql;\\n\\nEND;\\n\$\$ LANGUAGE plpgsql;\\n\\n\\nExample:\\nBEGIN;\\nCALL public.sp_dynamicPIVOT ('year','public.book_sales','MAX','sales','bookname', 'PIVOT_result');\\nFETCH ALL FROM PIVOT_result; CLOSE PIVOT_result;\\nEND;\\n</code></pre>\\n<h4><a id=\\"PIVOT_example_using_CTEs_260\\"></a><strong>PIVOT example using CTEs</strong></h4>\\n<p>You can use PIVOT as part of a CTE (Common Table Expression). See the following example code:</p>\n<pre><code class=\\"lang-\\">with dataset1 as\\n(Select bookname,sales from public.book_sales)\\nselect * from dataset1 PIVOT (\\n sum(sales)\\n FOR bookname IN ('LOTR', 'GOT', 'Harry Potter', 'Sherlock')\\n);\\n</code></pre>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/76fc20f9c30946f19185a23e0aeccfc7_image.png\\" alt=\\"image.png\\" /></p>\n<h4><a id=\\"Multiple_aggregations_for_PIVOT_278\\"></a><strong>Multiple aggregations for PIVOT</strong></h4>\\n<p>The following code illustrates multiple aggregations for PIVOT:</p>\n<pre><code class=\\"lang-\\">WITH dataset1 AS\\n(\\n SELECT 1 AS &quot;rownum&quot;,\\n bookname,\\n sales\\n FROM PUBLIC.book_sales)\\nSELECT *\\nFROM (\\n SELECT rownum,&quot;LOTR&quot; as avg_sales_lotr,&quot;GOT&quot; as avg_sales_got,&quot;Harry Potter&quot; as avg_sales_harrypotter,&quot;Sherlock&quot; as avg_sales_sherlock\\n FROM dataset1 PIVOT (avg(sales) FOR bookname IN ('LOTR','GOT','Harry Potter','Sherlock')) AS avg_sales) a\\nJOIN\\n (\\n SELECT rownum, &quot;LOTR&quot; as sum_sales_lotr,&quot;GOT&quot; as sum_sales_got,&quot;Harry Potter&quot; as sum_sales_harrypotter,&quot;Sherlock&quot; as sum_sales_sherlock\\n FROM dataset1 PIVOT (sum(sales) FOR bookname IN ('LOTR',\\n 'GOT', 'Harry Potter', 'Sherlock')) AS sum_sales) b\\nusing (rownum);\\n</code></pre>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/60f09df2a7094f898259ac954676575e_image.png\\" alt=\\"image.png\\" /></p>\n<h4><a id=\\"Summary_305\\"></a><strong>Summary</strong></h4>\\n<p>Although PIVOT and UNPIVOT aren’t entirely new paradigms of SQL language, the new native support for these operators in Amazon Redshift can help you achieve many robust use cases without the hassle of using alternate operators. In this post, we explored a few ways in which the new operators may come in handy.</p>\n<p>Adapt <a href=\\"https://docs.aws.amazon.com/redshift/latest/dg/r_FROM_clause-pivot-unpivot-examples.html\\" target=\\"_blank\\">PIVOT and UNPIVOT</a> into your workstreams now and work with us as we evolve the feature, incorporating more complex option sets. Please feel free to <a href=\\"https://aws.amazon.com/contact-us/\\" target=\\"_blank\\">reach out</a> to us if you need further help to achieve your custom use cases.</p>\\n<h4><a id=\\"About_the_authors_313\\"></a><strong>About the authors</strong></h4>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/30e93f3a15264ae2b578404861f6a95b_image.png\\" alt=\\"image.png\\" /></p>\n<p><strong>Ashish Agrawal</strong> is currently Sr. Technical Product Manager with [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) building cloud-based data warehouse and analytics cloud service. Ashish has over 24 years of experience in IT. Ashish has expertise in data warehouse, data lake, Platform as a Service. Ashish is speaker at worldwide technical conferences.</p>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/838af050ec234a08968dabc68edd129b_image.png\\" alt=\\"image.png\\" /></p>\n<p><strong>Sai Teja Boddapat</strong>i is a Database Engineer based out of Seattle. He works on solving complex database problems to contribute to building the most user friendly data warehouse available. In his spare time, he loves travelling, playing games and watching movies &amp; documentaries.</p>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/dd0f409d85994c2d8974dea5b41fc7b5_image.png\\" alt=\\"image.png\\" /></p>\n<p><strong>Maneesh Sharma</strong> is a Senior Database Engineer at Amazon Web Services with more than a decade of experience designing and implementing large-scale data warehouse and analytics solutions. He collaborates with various [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) Partners and customers to drive better integration.</p>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/1a0c9bfffaac45d2a35956efc3571376_image.png\\" alt=\\"image.png\\" /></p>\n<p><strong>Eesha Kumar</strong> is an Analytics Solutions Architect with Amazon Web Services . He works with customers to realize business value of data by helping them building solutions leveraging Amazon Web Services platform and tools.</p>\n"}
目录
亚马逊云科技解决方案 基于行业客户应用场景及技术领域的解决方案
联系亚马逊云科技专家
亚马逊云科技解决方案
基于行业客户应用场景及技术领域的解决方案
联系专家
0
目录
关闭