New Amazon Web Services Glue 4.0 – New and Updated Engines, More Data Formats, and More

海外精选

re:Invent

Amazon QuickSight

海外精选的内容汇集了全球优质的亚马逊云科技相关技术内容。同时，内容中提到的“AWS” 是 “Amazon Web Services” 的缩写，在此网站不作为商标展示。

{"value":"[Amazon Web Services Glue](https://aws.amazon.com/glue/) is a scalable, serverless tool that helps you to accelerate the development and execution of your data integration and [ETL](https://aws.amazon.com/what-is/etl/) workloads. Today we are launching Glue 4.0, with updated engines, support for additional data formats, Ray support, and a lot more.\n\nBefore I dive in, just a word about versioning. Unlike most Amazon Web Services services, where the service team owns and has full control over the APIs, Glue includes a collection of libraries, engines, and tools developed by the open source community. Some of these components do not maintain strict backward compatibility, often in pursuit of efficiency. In order to make sure that changes to the components do not impact your Glue jobs, you must select a particular Glue version when you create the job.\n\nEach version of Glue includes performance and reliability benefits in addition to the added features, and you should plan to upgrade your jobs over time to take advantage of all that Glue has to offer.\n\n**++Dive in to Glue++**\nLet’s take a look at what’s new in Glue 4.0:\n\nUpdated Engines – This version of Glue includes [Python 3.10](https://www.python.org/downloads/release/python-3100/) and [Apache Spark 3.3.0](https://spark.apache.org/releases/spark-release-3-3-0.html). Both engines include bug fixes and performance enhancements; Spark includes new features such as [row-level runtime filtering](https://issues.apache.org/jira/browse/SPARK-32268), [improved error messages](https://issues.apache.org/jira/browse/SPARK-38781), additional [built-in functions](https://issues.apache.org/jira/browse/SPARK-38783), and much more. Glue and [Amazon EMR](https://aws.amazon.com/emr) make use of the same optimized Spark runtime, which has been optimized to run in the Amazon Web Services cloud and can be 2-3 times faster than the basic open source version.\n\n**New Engine Plugins** – Glue 4.0 adds native support for the [Cloud Shuffle Service](https://docs.aws.amazon.com/glue/latest/dg/monitor-spark-shuffle-manager.html) Plugin for Spark to help you scale your disk usage, and [Adaptive Query Execution](https://spark.apache.org/docs/latest/sql-performance-tuning.html#adaptive-query-execution) to dynamically optimize your queries as they run.\n\n**Pandas Support** – [Pandas](https://pandas.pydata.org/) is an open source data analysis and manipulation tool that is built on top of Python. It is easy to learn and includes all kinds of interesting and useful [data manipulation functions](https://pandas.pydata.org/docs/reference/general_functions.html).\n\n**New Data Formats** – Whether you are building a data lake or a data warehouse, Glue 4.0 now handles new open source data formats for sources and targets, with support for [Apache Hudi](https://hudi.apache.org/), [Apache Iceberg](https://iceberg.apache.org/), and [Delta Lake](https://delta.io/). To learn more about these new options and formats, read [Get Started with Apache Hudi using Amazon Web Services Glue by Implementing Key Design Concepts](https://aws.amazon.com/blogs/big-data/part-1-get-started-with-apache-hudi-using-aws-glue-by-implementing-key-design-concepts/).\n\n**Everything Else** – In addition to the above items, Glue 4.0 also includes the Parquet vectorized reader, with support for additional data types and encodings. It has been upgraded to use [log4j 2](https://logging.apache.org/log4j/2.x/) and is no longer dependent on log4j 1.\n\n**++Available Now++**\nGlue 4.0 is available today in the US East (Ohio, N. Virginia), US West (N. California, Oregon), Africa (Cape Town), Asia Pacific (Hong Kong, Jakarta, Mumbai, Osaka, Seoul, Singapore, Sydney, Tokyo), Canada (Central), Europe (Frankfurt, Ireland, London, Milan, Paris, Stockholm), Middle East (Bahrain), and South America (Sao Paulo) Amazon Web Services Regions.\n\n— [Jeff](https://twitter.com/jeffbarr);\n\n![image.png](https://dev-media.amazoncloud.cn/adbdce01e5134447852d80bb63897781_image.png)\n\n## **++[Jeff Barr](https://aws.amazon.com/blogs/aws/author/jbarr/)++**\nJeff Barr is Chief Evangelist for Amazon Web Services. He started this blog in 2004 and has been writing posts just about non-stop ever since.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n","render":"<a href=\"https://aws.amazon.com/glue/\" target=\"_blank\">Amazon Web Services Glue</a> is a scalable, serverless tool that helps you to accelerate the development and execution of your data integration and <a href=\"https://aws.amazon.com/what-is/etl/\" target=\"_blank\">ETL</a> workloads. Today we are launching Glue 4.0, with updated engines, support for additional data formats, Ray support, and a lot more.\nBefore I dive in, just a word about versioning. Unlike most Amazon Web Services services, where the service team owns and has full control over the APIs, Glue includes a collection of libraries, engines, and tools developed by the open source community. Some of these components do not maintain strict backward compatibility, often in pursuit of efficiency. In order to make sure that changes to the components do not impact your Glue jobs, you must select a particular Glue version when you create the job.\nEach version of Glue includes performance and reliability benefits in addition to the added features, and you should plan to upgrade your jobs over time to take advantage of all that Glue has to offer.\n<ins>Dive in to Glue</ins> \nLet’s take a look at what’s new in Glue 4.0:\nUpdated Engines – This version of Glue includes <a href=\"https://www.python.org/downloads/release/python-3100/\" target=\"_blank\">Python 3.10</a> and <a href=\"https://spark.apache.org/releases/spark-release-3-3-0.html\" target=\"_blank\">Apache Spark 3.3.0</a>. Both engines include bug fixes and performance enhancements; Spark includes new features such as <a href=\"https://issues.apache.org/jira/browse/SPARK-32268\" target=\"_blank\">row-level runtime filtering</a>, <a href=\"https://issues.apache.org/jira/browse/SPARK-38781\" target=\"_blank\">improved error messages</a>, additional <a href=\"https://issues.apache.org/jira/browse/SPARK-38783\" target=\"_blank\">built-in functions</a>, and much more. Glue and <a href=\"https://aws.amazon.com/emr\" target=\"_blank\">Amazon EMR</a> make use of the same optimized Spark runtime, which has been optimized to run in the Amazon Web Services cloud and can be 2-3 times faster than the basic open source version.\nNew Engine Plugins – Glue 4.0 adds native support for the <a href=\"https://docs.aws.amazon.com/glue/latest/dg/monitor-spark-shuffle-manager.html\" target=\"_blank\">Cloud Shuffle Service</a> Plugin for Spark to help you scale your disk usage, and <a href=\"https://spark.apache.org/docs/latest/sql-performance-tuning.html#adaptive-query-execution\" target=\"_blank\">Adaptive Query Execution</a> to dynamically optimize your queries as they run.\nPandas Support – <a href=\"https://pandas.pydata.org/\" target=\"_blank\">Pandas</a> is an open source data analysis and manipulation tool that is built on top of Python. It is easy to learn and includes all kinds of interesting and useful <a href=\"https://pandas.pydata.org/docs/reference/general_functions.html\" target=\"_blank\">data manipulation functions</a>.\nNew Data Formats – Whether you are building a data lake or a data warehouse, Glue 4.0 now handles new open source data formats for sources and targets, with support for <a href=\"https://hudi.apache.org/\" target=\"_blank\">Apache Hudi</a>, <a href=\"https://iceberg.apache.org/\" target=\"_blank\">Apache Iceberg</a>, and <a href=\"https://delta.io/\" target=\"_blank\">Delta Lake</a>. To learn more about these new options and formats, read <a href=\"https://aws.amazon.com/blogs/big-data/part-1-get-started-with-apache-hudi-using-aws-glue-by-implementing-key-design-concepts/\" target=\"_blank\">Get Started with Apache Hudi using Amazon Web Services Glue by Implementing Key Design Concepts</a>.\nEverything Else – In addition to the above items, Glue 4.0 also includes the Parquet vectorized reader, with support for additional data types and encodings. It has been upgraded to use <a href=\"https://logging.apache.org/log4j/2.x/\" target=\"_blank\">log4j 2</a> and is no longer dependent on log4j 1.\n<ins>Available Now</ins> \nGlue 4.0 is available today in the US East (Ohio, N. Virginia), US West (N. California, Oregon), Africa (Cape Town), Asia Pacific (Hong Kong, Jakarta, Mumbai, Osaka, Seoul, Singapore, Sydney, Tokyo), Canada (Central), Europe (Frankfurt, Ireland, London, Milan, Paris, Stockholm), Middle East (Bahrain), and South America (Sao Paulo) Amazon Web Services Regions.\n— <a href=\"https://twitter.com/jeffbarr\" target=\"_blank\">Jeff</a>;\n<img src=\"https://dev-media.amazoncloud.cn/adbdce01e5134447852d80bb63897781_image.png\" alt=\"image.png\" />\n<h2><a id=\"Jeff_Barrhttpsawsamazoncomblogsawsauthorjbarr_26\"></a><ins><a href=\"https://aws.amazon.com/blogs/aws/author/jbarr/\" target=\"_blank\">Jeff Barr</a></ins></h2>\nJeff Barr is Chief Evangelist for Amazon Web Services. He started this blog in 2004 and has been writing posts just about non-stop ever since.\n"}

亚马逊云科技解决方案基于行业客户应用场景及技术领域的解决方案

联系亚马逊云科技专家