Open source news and updates #134

海外精选
开源
Amazon Simple Storage Service (S3)
Amazon EMR
海外精选的内容汇集了全球优质的亚马逊云科技相关技术内容。同时,内容中提到的“AWS” 是 “Amazon Web Services” 的缩写,在此网站不作为商标展示。
0
0
{"value":"## November 7th, 2022 - Instalment #134\n\n### Welcome\n\nWelcome to the Amazon Web Services open source newsletter, edition #134. This weeks newsletter was featured in the latest ++[Build on Open Source on twitch.tv/aws](https://www.twitch.tv/videos/1643077489?filter=archives&sort=time)++, so I hope some of you were able to tune in and watch.\n\nNew projects that we featured include \"enclaver\", a toolkit to make working with enclaves easier, \"s3crets_scanner\" a new secrets scanning tool, \"sandbox-accounts-for-events\" a way to easily vend temporary environments, \"frontend-discovery\" helps you define and drive adoption of a frontend discovery patterns, \"cf-sam-openapi-file-organization-demo\", a tool to help you get started with API development, \"decoupling-microservices-lambda-amazonmq-rabbitmq\" a sample solution to get you started on how to use micro services with RabbitMQ, \"how-to-write-more-correct-software-workshop\" a workshop to get you developing better software, and more!\n\nWe also have content on Amazon Web Services ParallelCluster, Apache Hudi, Apache Iceberg, Apache Flin, Hive, PrestoDB, Trino, [Amazon EMR](https://aws.amazon.com/cn/emr/?trk=cndc-detail), Apache Kafka, Babelfish for Aurora PostgreSQL, Firecracker, MySQL, ArgoCD, PostgreSQL, Fluentbit, Amazon Web Services Distro for OpenTelemetry, and more so be sure to check out all these great posts this week.\n\nFinally, make sure you review the events section as there are plenty of open source events coming up on your radar over the coming weeks. I will be speaking at the Open Source Edinburgh event on Wednesday, so I hope to see some of you there.\n\n### **Feedback**\n\nPlease let me know how we can improve this newsletter as well as how Amazon Web Services can better work with open source projects and technologies by completing ++[this very short survey](https://eventbox.dev/survey/NUSZ91Z)++ that will take you probably less than 30 seconds to complete. Thank you so much!\n\n### **Celebrating open source contributors**\n\nThe articles and projects shared in this newsletter are only possible thanks to the many contributors in open source. I would like to shout out and thank those folks who really do power open source and enable us all to learn and build on top of what they have created.\n\nSo thank you to the following open source heroes: John Russell, Beny Ibrani, Eilon Harel, Eugene Yahubovich, Richard Case, John Russell, Faizal Khan, Ashish Bhatia, Jagadeesh Chitikesi, Benson Kwong, Stanley Chukwuemeke, Baruch Assif Osoveskiy, and Kehinde Otubamowo\n\n## **Latest open source projects**\n\nThe great thing about open source projects is that you can review the source code. If you like the look of these projects, make sure you that take a look at the code, and if it is useful to you, get in touch with the maintainer to provide feedback, suggestions or even submit a contribution.\n\n### Tools\n\n### **enclaver**\n\n++[enclaver](https://aws-oss.beachgeek.co.uk/27k)++ is a new open source toolkit created to enable easy adoption of software enclaves (such as what is provided by Amazon Web Services Nitro Enclaves), for new and existing backend software. Make sure you check the project documentation out which outlines in more detail some of the aspects of enclaves.\n\n![diagramenclavercomponents.png](https://dev-media.amazoncloud.cn/f35a5327b6a54804a25f6dfd05ea924e_diagram-enclaver-components.png)\n\nEugene Yahubovich, the project founder has also put together a very nice blog post that dives deeper into use cases and how to get started. Go read, ++[Introducing Enclaver: an open-source tool for building, testing and running code within secure enclaves](https://aws-oss.beachgeek.co.uk/27n)++. This weeks project of the week.\n\n### **s3crets_scanner**\n\n++[s3crets_scanner](https://aws-oss.beachgeek.co.uk/27l)++ is an open source tool from Eilon Harel that is designed to provide a complementary layer for the [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail) Security Best Practices by proactively hunting secrets in public S3 buckets. It can be executed as scheduled task or run On-Demand.\n\n![scanner_gif.gif](https://dev-media.amazoncloud.cn/debd82a2b9d84facb758886b73a401e1_scanner_gif.gif)\n\nEilon Harel has also put together a blog post diving deeper into this, in the post ++[Hunting After Secrets Accidentally Uploaded To Public S3 Buckets](https://aws-oss.beachgeek.co.uk/27m)++\n\n![0 SwYWjtV0rRFecr4.jpg](https://dev-media.amazoncloud.cn/c86d1919c2fe4b1d9ec838ef5b589cf2_0%20Sw-YWjtV0rRFecr4.jpg)\n\n### **sandbox-accounts-for-events**\n\n++[sandbox-accounts-for-events](https://aws-oss.beachgeek.co.uk/27q)++ \"Sandbox Accounts for Events\" allows to provide multiple, temporary Amazon Web Services accounts to a number of authenticated users simultaneously via a browser-based GUI. It uses the concept of \"leases\" to create temporary access tickets and allows to define expiration periods as well as maximum budget spend per leased Amazon Web Services account. Check out the docs for some example uses cases where you might find a tool like this useful, as well as understanding more how this works under the hood.\n\n![tableleases.png](https://dev-media.amazoncloud.cn/767ff650ccaf40d594a330f3d7cc2fc1_table-leases.png)\n\n### **frontend-discovery**\n\n++[frontend-discovery](https://aws-oss.beachgeek.co.uk/27r)++ The aim of this project is to define and drive adoption of a frontend discovery pattern, with a primary focus on client-side rendered (CSR), server-side rendered (SSR) and edge-side rendered (ESR) micro-frontends. The frontend discovery pattern improves the development experience when developing, testing, and delivering micro-frontends by making use of a shareable configuration describing the entry point of micro-frontends, as well as additional metadata that can be used to deploy in every environment safely.\n\nCheck out the readme to find out more about the motives behind the project, and dive into it with an example.\n\n### **Demos, Samples, Solutions and Workshops**\n\n### **mapper-for-fhir**\n\n++[mapper-for-fhir](https://aws-oss.beachgeek.co.uk/27p)++ FHIR is a standard for health care data exchange, and this repo provides assets that allow for the automated deployment of an HL7v2, leveraging native Amazon Web Services services. The repo provides a CDK application that simplifies deployment.\n\n### **cf-sam-openapi-file-organization-demo**\n\n++[cf-sam-openapi-file-organization-demo](https://aws-oss.beachgeek.co.uk/27w)++ The project is the API back-end for a widget tracking website. Widget tracking is simplistic - widgets have only a unique name and a colour descriptor for properties. If you want to explore how to approach API design, then this is a good repo to explore.\n\n![ArchitectureDiagram.png](https://dev-media.amazoncloud.cn/f0c5675175e1481380fe504fa93d0c36_Architecture-Diagram.png)\n\n### **sam-accelerate-nested-stacks-demo**\n\n++[sam-accelerate-nested-stacks-demo](https://aws-oss.beachgeek.co.uk/27v)++ This repository shows how to use CloudFormation nested stacks with Amazon Web Services SAM Accelerate. Nested stacks are stacks created as part of other stacks. In this demo repo, there are four separate stacks that make up the entire solution. Amazon Web Services SAM manages all four as CloudFormation nested stacks. During development, we show how to use SAM Accelerate to quickly update resources, shortening the development loop.\n\n\n![orders.png](https://dev-media.amazoncloud.cn/1b1e7e05f526495baaca10f744f6f528_orders.png)\n\n\n### **decoupling-microservices-lambda-amazonmq-rabbitmq**\n\n++[decoupling-microservices-lambda-amazonmq-rabbitmq](https://aws-oss.beachgeek.co.uk/27u)++ This project is a solution architecture that demonstrates decoupling micro services with [Amazon MQ](https://aws.amazon.com/cn/amazon-mq/?trk=cndc-detail) for RabbitMQ and Amazon Web Services Lambda. A decoupled application architecture allows each component to perform its tasks independently and a change in one service shouldn't require a change in the other services.\n\n![mq_decoupled_apps.png](https://dev-media.amazoncloud.cn/2f2fafa35f474388991dc8ab011f258a_mq_decoupled_apps.png)\n\n### **how-to-write-more-correct-software-workshop**\n\n++[how-to-write-more-correct-software-workshop](https://aws-oss.beachgeek.co.uk/27t)++ so last week I featured duvet, a tool to help you codify and automate validation of your software against honouring RFC specs. This repo contains a workshop that walks you through a practical example of that, but also features dafny, a programming language that formally verifies your implementation matches your specification. I have this on my weekend to do list.\n\n## **Amazon Web Services and Community blog posts**\n\n### **Kubernetes**\n\nBenson Kwong has been busy putting this blog post together, ++[Multi-cluster management for Kubernetes with Cluster API and Argo CD](https://aws-oss.beachgeek.co.uk/27z)++ where he introduces what Cluster API is and explained why you can use this useful tool for managing multiple Kubernetes clusters instead of struggling with different APIs and tool sets to maintain them. He also covers how you can also integrate ArgoCD to add that sprinkle of continuous delivery with Git as your source of truth. [hands on]\n\n![kwong.png](https://dev-media.amazoncloud.cn/b1fa57b263f848bb936c02e8946bf6b0_kwong.png)\n\n## **CoreWCF**\n\nCoreWCF is a port of the service side of Windows Communication Foundation (WCF) to .NET Core. The goal of this project is to enable existing WCF services to move to .NET Core. In the post, ++[Running your modern CoreWCF application on Amazon Web Services](https://aws-oss.beachgeek.co.uk/27y)++ Ashish Bhatia and Jagadeesh Chitikesi show you how to deploy a CoreWCF application on an Amazon Linux Graviton2 instance. [hands on]\n\n### **MySQL**\n\nIn the post, ++[Enable change data capture on Amazon RDS for MySQL applications](https://aws-oss.beachgeek.co.uk/280)++ that are using XA transactions Stanley Chukwuemeke, Baruch Assif Osoveskiy, and Kehinde Otubamowo have collaborated to present a solution to safely replicate change streams from a MySQL database using XA transactions, to down stream OpenSearch. [hands on]\n\n![DBBLOG2409archdiag11.png](https://dev-media.amazoncloud.cn/7454a19155784c1fbfbdd083fd04e7e2_DBBLOG-2409-arch-diag-1-1.png)\n\n### **Other posts and quick reads**\n\n- ++[Create a Multi-Region Python Package Publishing Pipeline with Amazon Web Services CDK and CodePipeline](https://aws-oss.beachgeek.co.uk/27x)++ walks you through how to deploy a CodePipeline pipeline to automate the publishing of Python packages to multiple CodeArtifact repositories in separate regions [hands on]\n\n![devops_2053_1.png](https://dev-media.amazoncloud.cn/5f72721d3ea045b1849e44a11374588e_devops_2053_1.png)\n\n- ++[Microservice observability with Amazon OpenSearch Service part 1: Trace and log correlation](https://aws-oss.beachgeek.co.uk/281)++ is a two part blog that uses a sample micro service to show you how you can implement observability using a number of open source tools such as Fluentbit, Amazon Web Services Distro for OpenTelemetry, and OpenSearch [hands on]\n\n![BDB2223P1image001.jpg](https://dev-media.amazoncloud.cn/98a1772b6a1d471b9bf99f010ecd2f3d_BDB-2223-P1-image001.jpg)\n\n\n\n- ++[Migrate Oracle hierarchical queries to Amazon Aurora PostgreSQL](https://aws-oss.beachgeek.co.uk/282)++ demonstrates via sample queries how you can migrate Oracle hierarchical queries using a number of keywords to PostgreSQL [hands on]\n- ++[What to consider when modernizing APIs with GraphQL on Amazon Web Services](https://aws-oss.beachgeek.co.uk/283)++ provides a good primer on how GraphQL works and how integrating it with Amazon Web Services services can help you build modern applications\n- ++[Managing Computer Labs on Amazon AppStream 2.0 with Open Source Virtual Application Management](https://aws-oss.beachgeek.co.uk/284)++ provides a hands on guide to using an open source project previously featured in this newsletter, to help administrators programmatically create AppStream 2.0 images [hands on]\n\n![VAMarchitecturediagram.png](https://dev-media.amazoncloud.cn/c574c1cc1c824eaaba08f3646ffaa33f_VAM-architecture-diagram.png)\n\n## **Quick updates**\n\n### **PHP**\n\nAmazon Web Services App Runner now supports PHP 8.1, Go 1.18, .Net 6, and Ruby 3.1 managed runtimes for building and running web applications and APIs. These runtimes enable you to leverage the App Runner “build from source” capability to build and deploy directly from your source code repository without needing to learn the internals of building and managing your own container images.\n\nStarting today, you can build and run your services based on PHP 8.1, Go 1.18, .Net 6, and Ruby 3.1 directly from your source code on App Runner. All these new managed runtimes in App Runner are active long-term support (LTS) major versions.\n\n### **Apache Kafka**\n\n[Amazon Managed Streaming for Apache Kafka](https://aws.amazon.com/cn/msk/?trk=cndc-detail) (MSK) now offers Tiered storage that brings a virtually unlimited and low-cost storage tier. Tiered Storage lets you store and process data using the same Kafka APIs and clients , while saving your storage costs by 50% or more over existing MSK storage options . Tiered Storage makes it easy and cost-effective when you need a longer safety buffer to handle unexpected processing delays or build new stream processing applications. You can now scale your compute and storage independently, simplifying operations.\n\nAmazon MSK Connect also now supports Private DNS hostnames for enhanced security. With Private DNS hostname support in MSK Connect, you can configure connectors to reference public or private domain names. Connectors will use the DNS servers configured in your VPC’s DHCP option set to resolve domain names. You can now use MSK Connect to privately connect with databases, data warehouses and other resources in your VPC to comply with your security needs.\n\n### **[Amazon EMR](https://aws.amazon.com/cn/emr/?trk=cndc-detail)**\n\n[Amazon EMR](https://aws.amazon.com/cn/emr/?trk=cndc-detail) 6.8 has seen a number of updates that you should be aware of.\n\nPrestoDB and Trino\n\nWith PrestoDB and Trino on EMR 6.8, users benefit from a configuration setting, called the strict mode that prevents cost overruns due to long running queries.Customers have told us that poorly written SQL queries can sometimes run for long times, and consume resources from other business critical workloads. To help administrators take action on such queries, we are introducing strict mode setting that allows warning or rejecting certain types of queries. Examples include queries without predicates on partitioned columns that result in large table scans, or queries that involve cross join between large tables, and/or queries that sort large number of rows without limit. You can set up strict mode configuration during cluster creation and also override the setting using session properties. You can apply strict mode checks for select, insert, create table as select and explain analyse query types.\n\nWe are also excited to announce that [Amazon EMR](https://aws.amazon.com/cn/emr/?trk=cndc-detail) PrestoDB and Trino has added a new features to handle spot interruptions that helps run your queries cost effectively and reliably. Spot Instances in [Amazon EMR](https://aws.amazon.com/cn/emr/?trk=cndc-detail) allows you to run big data workloads on spare [Amazon EC2 ](https://aws.amazon.com/cn/ec2/?trk=cndc-detail)capacity at a reduced cost compared to On-Demand instances. However, [Amazon EC2 ](https://aws.amazon.com/cn/ec2/?trk=cndc-detail)can interrupt spot instances with a two-minute notification. PrestoDB/Trino queries fail when spot nodes are terminated. This has meant that customers were unable to run such workloads on spot instances and take advantage of lower costs. In EMR 6.7, we added a new capability to PrestoDB/Trino engine to detect spot interruptions and determine if the existing queries can complete within two minutes on those nodes. If the queries cannot finish, we fail quickly and retry the queries on different nodes. [Amazon EMR](https://aws.amazon.com/cn/emr/?trk=cndc-detail) PrestoDB/Trino engine also does not schedule new queries on spot nodes that are about to be reclaimed. With these two new features, you will get best of both worlds - improved resiliency with PrestoDB/Trino engine on [Amazon EMR](https://aws.amazon.com/cn/emr/?trk=cndc-detail), and running queries economically on spot nodes.\n\n*Hive*\n\nHive users run Metastore check command with the repair table option (MSCK REPAIR table) to update the partition metadata in the Hive metastore for partitions that were directly added to or removed from the file system (S3 or HDFS). When run, MSCK repair command must make a file system call to check if the partition exists for each partition. This step could take a long time if the table has thousands of partitions. In EMR 6.5, we introduced an optimisation to MSCK repair command in Hive to reduce the number of S3 file system calls when fetching partitions . This feature improves performance of MSCK command (~15-20x on 10k+ partitions) due to reduced number of file system calls especially when working on tables with large number of partitions. Previously, you had to enable this feature by explicitly setting a flag. Starting with [Amazon EMR](https://aws.amazon.com/cn/emr/?trk=cndc-detail) 6.8, we further reduced the number of S3 filesystem calls to make MSCK repair run faster and enabled this feature by default.\n\nIn addition to MSCK repair table optimisation, we also like to share that [Amazon EMR](https://aws.amazon.com/cn/emr/?trk=cndc-detail) Hive users can now use Parquet modular encryption to encrypt and authenticate sensitive information in Parquet files. It is a challenging task to protect the privacy and integrity of sensitive data at scale while keeping the Parquet functionality intact. Data protection solutions such as encrypting files or storage layer are currently used to encrypt Parquet files, however, they could lead to performance degradation. With Parquet modular encryption, you can not only enable granular access control but also preserve the Parquet optimisations such as columnar projection, predicate pushdown, encoding and compression. Using Parquet modular encryption, [Amazon EMR](https://aws.amazon.com/cn/emr/?trk=cndc-detail) Hive users can protect both Parquet data and metadata, use different encryption keys for different columns, and perform partial encryption of only sensitive columns. It also allows clients to check integrity of the data retrieved while keeping all Parquet optimisations. \n\n*Apache Flink*\n\n[Amazon EMR](https://aws.amazon.com/cn/emr/?trk=cndc-detail) includes Apache Flink 1.15.1. This feature is available on EMR on EC2.\nApache Flink is an open source framework and engine for processing data streams. Apache Flink 1.15.1 on EMR 6.8 includes 62 bug fixes, vulnerability fixes, and minor improvements over Flink 1.15.0. Key features include:\n\nWatermark alignment (Beta) across data sources : Event-time processing in Flink depends on special timestamped elements, called watermarks, that are inserted into the stream either by the data sources or by a watermark generator. A watermark with a timestamp t can be understood as an assertion that all events with timestamps < t have already arrived. Watermark alignment is useful when processing sources with different velocity of events e.g. when one source is idle or one source emits records relatively faster than others, you can enable watermark alignment for each source separately. Flink aligns watermarks by pausing the highest velocity source and continuing to read records from other sources until the watermarks are aligned.\\nSQL version upgrade : Introducing JSON plans which are JSON functions that make it easier to import and export structured data in SQL. Today, version upgrades can alter the topology of SQL queries which can introduce snapshot incompatibility across versions. This makes upgrading Flink versions challenging. With this feature, both the Table API and SQL will provide a way to compile and execute a plan ensuring the same topology for SQL queries throughout different versions, making it more reliable to upgrade to future versions. Users who want to give it a try can create a JSON plan that can then be used to restore a Flink job based on the old operator structure.\\n\\n*Apache Hudi and Apache Iceberg*\\n\\nAmazon EMR release 6.8 now supports Apache Hudi 0.11.1 and Apache Iceberg 0.14.0. You can use these frameworks on Amazon EMR on EC2, and Amazon EMR on EKS as well as on Amazon EMR Serverless.\\n\\nApache Hudi 0.11.1 on Amazon EMR 6.8 includes support for Spark 3.3.0, adds Multi-Modal Index support and Data Skipping with Metadata Table that allows adding bloom filter and column stats indexes to tables which can significantly improve query performance, adds an Async Indexer service which allows users to create different kinds of indices (e.g., files, bloom filters, and column stats) in the metadata table without blocking ingestion, includes Spark SQL improvements adding support for update or delete records in Hudi tables using non-primary-key fields and Time travel query via timestamp as of syntax, includes Flink integration improvements with support for both Flink 1.13.x and 1.14.x and support for complex data types such as Map and Array etc. In addition, Hudi 0.11.1 includes bug fixes over Hudi 0.11.0 available in Amazon EMR release 6.7. \\n\\nApache Iceberg 0.14.0 on Amazon EMR 6.8 includes support for Spark 3.3.0, adds Merge-on-read support for MERGE and UPDATE statements, adds support to rewrite partitions using Z-order that allows to re-organize partitions to be efficient with query predicates on multiple columns and also to keep similar data together, includes several performance improvements for scan planning in Spark queries, add support for row group skipping using Parquet bloom filters, etc.\\n\\n### **Amazon Web Services ParallelCluster**\\n\\nAmazon Web Services ParallelCluster 3.3 is now generally available and introduces a new feature for compute resource optimisation. With this new feature, you can map a compute resource to a list of Amazon EC2 instance types with an allocation strategy to optimize compute capacity for your HPC jobs. Other features include updates that support dynamically mounting shared storage, Slurm accounting, and Amazon EC2 on demand capacity reservations (ODCR).\\n\\n\\nCheck out the announcement for more details, ++[Amazon Web Services ParallelCluster 3.3: multiple instance type allocation and other top requested features](https://aws-oss.beachgeek.co.uk/27h)++\\n\\nMatt Vaughn has also published a blog post, ++[Support for Instance Allocation Flexibility in Amazon Web Services ParallelCluster](https://aws-oss.beachgeek.co.uk/27i)++ 3.3, where he explains in detail how a new feature that was also announced called “multiple instance type allocation” in ParallelCluster 3.3.0. This feature enables you to specify multiple instance types to use when scaling up the compute resources for a Slurm job queue. Your HPC workloads will have more paths forward to get access to the EC2 capacity they need, helping you to get more computing done.\\n\\n## **Videos of the week**\\n\\n### **Firecracker**\\n\\nIn this lightening talk from KubeCon, Richard Case from SUSE brings you up to speed with what microVMs are, how they can be useful and provides plenty of examples. This is great stuff for having a better understanding of then how you might use Firecracker, an open source microVM project from Amazon Web Services.\\n\\n<video src=\\"https://dev-media.amazoncloud.cn/9e0b0ec0676445b4b7cb91376288ef25_Lightning%20Talk%EF%BC%9A%20What%20Are%20MicroVMs%EF%BC%9F%20And%20Why%20Should%20I%20Care%EF%BC%9F%20-%20Richard%20Case%2C%20SUSE.mp4\\" class=\\"manvaVedio\\" controls=\\"controls\\" style=\\"width:160px;height:160px\\"></video>\n\n\n### **Babelfish for Aurora PostgreSQL**\n\nJoin fellow Developer Advocate John Russell as he shows you how to set up a database server using the combination of Babelfish and Aurora, connect to the database, and run both PostgreSQL and T-SQL statements. Babelfish for PostgreSQL is an open source project that provides a compatibility layer for the SQL dialect (T-SQL) used by Microsoft SQL Server. Babelfish helps migrate database applications onto PostgreSQL with minimal changes to the application code.\n\n<video src=\\"https://dev-media.amazoncloud.cn/0566de9e8c594c0ea93ae67579a3463c_Get%20Up%20and%20Running%20with%20Babelfish%20for%20Aurora%20PostgreSQL%20%EF%BD%9C%20Amazon%20Web%20Services.mp4\\" class=\\"manvaVedio\\" controls=\\"controls\\" style=\\"width:160px;height:160px\\"></video>\n\n### **Build on Open Source**\n\nFor those unfamiliar with this show, Build on Open Source is where we go over this newsletter and then invite special guests to dive deep into their open source project. Expect plenty of code, demos and hopefully laughs.\n\nWe have put together a playlist so that you can easily access all the other episodes of the Build on Open Source show. ++[Build on Open Source playlist](https://aws-oss.beachgeek.co.uk/24u)++\n\n\n## **Events for your diary**\n\n### **OpenSearch - Development Backlog & Triage Meeting Security**\n### **7th November - 12pm PT**\n\nThe OpenSearch engineering team working on the Security repo have opened up their Backlog & Triage meetings to the public. This is a great opportunity to find out more about the inner workings of open source projects such as OpenSearch. Don't worry if you cannot make this meeting as they are currently scheduled from the 7th of November out through Dec 19th.\n\nCheck out the entire ++[list here](https://aws-oss.beachgeek.co.uk/285)++.\n\n### Open Source & Amazon Web Services IoT\n### 7th November, 4pm IST\n\nInternet of Things and other categories of hardware devices with edge computing capabilities that communicate over the internet to perform remote actions require a stable, easy-to-use and, most importantly, secure operating software that can be audited independently. In this session, the presenter will look at ways that Amazon Web Services supports and contributes to the open-source community to make these devices more resilient and feature-rich through solutions such as Amazon Web Services Greengrass and Amazon Web Services IoT Core.\n\nAmazon Web Services Hero Faizal Khan will be your host, and you can sign up ++[via the registration page here.](https://aws-oss.beachgeek.co.uk/27o)++\n\n### **Open Source Edinburgh**\n### **9th November, 5:30pm Scott Logic in Edinburgh**\n\nI will be talking at the Open Source Edinburgh meet-up this week, and you can find details on the location and how to reserve your spot by clicking on the ++[meetup.com link](https://www.meetup.com/open-source-edinburgh/events/289090920/)++. Hope to see some of you there.\n\n## Running Open Source Transcoding Server on [Amazon EKS](https://aws.amazon.com/cn/eks/?trk=cndc-detail)\n### **Friday, 18th, 19:00 WIB**\n\nJoin Beny Ibrani and the Amazon Web Services User Group Indonesia for this session (local language I believe) where Beny will show you how you can use open source transcoding software running on [Amazon EKS](https://aws.amazon.com/cn/eks/?trk=cndc-detail).\n\nThis session will be streamed on YouTube, so ++[check it out here](https://aws-oss.beachgeek.co.uk/27j)++\n\n\n### **Build on Amazon Web Services Open Source**\n### **November 18th, 9am BST**\n\nJoin us for the sixth episode of the Build on Amazon Web Services series, featuring a live round up of the latest projects and news as well as a special guest speaker. We have another special guest lined up, and we will announce this next week. Follow the show on @buildonopen for more details. Check it out on ++[https://twitch.tv/aws](https://twitch.tv/aws)++\n\n### **Amazon Web Services Elastic Kubernetes Service (EKS) Workshop**\n### **November 10th, London 5pm**\n\nJoin us for an interactive workshop on containers, Docker, Fargate and [Amazon EKS](https://aws.amazon.com/cn/eks/?trk=cndc-detail), hosted by ClearScale and Amazon Web Services. This live, virtual workshop includes three hours of interactive presentation and hands-on lab work. You will take part in the setup and deployment of containers using EKS. Follow along and work directly with Amazon Web Services professionals and ClearScale (an Amazon Web Services Premier Tier Services Partner) in this Level 200 training session.\n\nYou can find out more about this event by ++[checking out the event page and signing up.](https://aws-oss.beachgeek.co.uk/22y)++\n\n### **re:Invent**\n### **November 28th - December 3rd, Las Vegas**\n\nre:Invent is only a few weeks away so I want to share a few things that will hopefully be of interest.\n\nFirst up, we will be running the Build On Live stream throughout re:Invent and we would love to feature you! If either yourself, or perhaps you know a community member going to re:Invent and think they will absolutely love to attend the livestream, we want to hear from you. Please nominate a community member you want to hear from during Build On Live ++[using this survey.](https://eventbox.dev/survey/6B0ED1J)++\n\nSecond, check out this handy way to look at all the amazing open source sessions, then check out this ++[dashboard](https://aws-oss.beachgeek.co.uk/252)++ [sign up required]. I would love to hear which ones you are excited about so please let me know in the comments or via Twitter. If you want to hear what my top three, must watch sessions, then this is what I would attend (sadly, as an Amazon Web Services employee I am not allowed to attend sessions)\n\n\n\n1. OPN306 Amazon Web Services Lambda Powertools: Lessons from the road to 10 million downloads - Heitor Lessa is going to deliver an amazing session on the journey from idea to one of the most loved and used open source tools for Amazon Web Services Lambda users\n2.BOA204 When security, safety, and urgency all matter: Handling Log4Shell - Cannot wait for this session from Abbey Fuller who will walk us through how we managed this incident\n3. OPN202 Maintaining the Amazon Web Services Amplify Framework in the open - Matt Auerbach and Ashish Nanda are going to share details on how Amplify engineering managers work with the OSS community to build open-source software\n\nThere are many other great open source sessions, and hopefully I will try and put together a more comprehensive list as approach re:Invent.\n\n### OpenSearch\n### Every other Tuesday, 3pm GMT\n\nThis regular meet-up is for anyone interested in OpenSearch & Open Distro. All skill levels are welcome and they cover and welcome talks on topics including: search, logging, log analytics, and data visualisation.\n\nSign up to the next session, ++[OpenSearch Community Meeting](https://aws-oss.beachgeek.co.uk/1az)++\n\n## **Stay in touch with open source at Amazon Web Services**\n\nI hope this summary has been useful. Remember to check out the ++[Open Source homepage](https://aws.amazon.com/opensource/?opensource-all.sort-by=item.additionalFields.startDate&opensource-all.sort-order=asc)++ to keep up to date with all our activity in open source by following us on ++[@AWSOpen](https://twitter.com/AWSOpen)++\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n","render":"<h2><a id=\\"November_7th_2022__Instalment_134_0\\"></a>November 7th, 2022 - Instalment #134</h2>\\n<h3><a id=\\"Welcome_2\\"></a>Welcome</h3>\\n<p>Welcome to the Amazon Web Services open source newsletter, edition #134. This weeks newsletter was featured in the latest <ins><a href=\\"https://www.twitch.tv/videos/1643077489?filter=archives&amp;sort=time\\" target=\\"_blank\\">Build on Open Source on twitch.tv/aws</a></ins>, so I hope some of you were able to tune in and watch.</p>\n<p>New projects that we featured include “enclaver”, a toolkit to make working with enclaves easier, “s3crets_scanner” a new secrets scanning tool, “sandbox-accounts-for-events” a way to easily vend temporary environments, “frontend-discovery” helps you define and drive adoption of a frontend discovery patterns, “cf-sam-openapi-file-organization-demo”, a tool to help you get started with API development, “decoupling-microservices-lambda-amazonmq-rabbitmq” a sample solution to get you started on how to use micro services with RabbitMQ, “how-to-write-more-correct-software-workshop” a workshop to get you developing better software, and more!</p>\n<p>We also have content on Amazon Web Services ParallelCluster, Apache Hudi, Apache Iceberg, Apache Flin, Hive, PrestoDB, Trino, Amazon EMR, Apache Kafka, Babelfish for Aurora PostgreSQL, Firecracker, MySQL, ArgoCD, PostgreSQL, Fluentbit, Amazon Web Services Distro for OpenTelemetry, and more so be sure to check out all these great posts this week.</p>\n<p>Finally, make sure you review the events section as there are plenty of open source events coming up on your radar over the coming weeks. I will be speaking at the Open Source Edinburgh event on Wednesday, so I hope to see some of you there.</p>\n<h3><a id=\\"Feedback_12\\"></a><strong>Feedback</strong></h3>\\n<p>Please let me know how we can improve this newsletter as well as how Amazon Web Services can better work with open source projects and technologies by completing <ins><a href=\\"https://eventbox.dev/survey/NUSZ91Z\\" target=\\"_blank\\">this very short survey</a></ins> that will take you probably less than 30 seconds to complete. Thank you so much!</p>\n<h3><a id=\\"Celebrating_open_source_contributors_16\\"></a><strong>Celebrating open source contributors</strong></h3>\\n<p>The articles and projects shared in this newsletter are only possible thanks to the many contributors in open source. I would like to shout out and thank those folks who really do power open source and enable us all to learn and build on top of what they have created.</p>\n<p>So thank you to the following open source heroes: John Russell, Beny Ibrani, Eilon Harel, Eugene Yahubovich, Richard Case, John Russell, Faizal Khan, Ashish Bhatia, Jagadeesh Chitikesi, Benson Kwong, Stanley Chukwuemeke, Baruch Assif Osoveskiy, and Kehinde Otubamowo</p>\n<h2><a id=\\"Latest_open_source_projects_22\\"></a><strong>Latest open source projects</strong></h2>\\n<p>The great thing about open source projects is that you can review the source code. If you like the look of these projects, make sure you that take a look at the code, and if it is useful to you, get in touch with the maintainer to provide feedback, suggestions or even submit a contribution.</p>\n<h3><a id=\\"Tools_26\\"></a>Tools</h3>\\n<h3><a id=\\"enclaver_28\\"></a><strong>enclaver</strong></h3>\\n<p><ins><a href=\\"https://aws-oss.beachgeek.co.uk/27k\\" target=\\"_blank\\">enclaver</a></ins> is a new open source toolkit created to enable easy adoption of software enclaves (such as what is provided by Amazon Web Services Nitro Enclaves), for new and existing backend software. Make sure you check the project documentation out which outlines in more detail some of the aspects of enclaves.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/f35a5327b6a54804a25f6dfd05ea924e_diagram-enclaver-components.png\\" alt=\\"diagramenclavercomponents.png\\" /></p>\n<p>Eugene Yahubovich, the project founder has also put together a very nice blog post that dives deeper into use cases and how to get started. Go read, <ins><a href=\\"https://aws-oss.beachgeek.co.uk/27n\\" target=\\"_blank\\">Introducing Enclaver: an open-source tool for building, testing and running code within secure enclaves</a></ins>. This weeks project of the week.</p>\n<h3><a id=\\"s3crets_scanner_36\\"></a><strong>s3crets_scanner</strong></h3>\\n<p><ins><a href=\\"https://aws-oss.beachgeek.co.uk/27l\\" target=\\"_blank\\">s3crets_scanner</a></ins> is an open source tool from Eilon Harel that is designed to provide a complementary layer for the Amazon S3 Security Best Practices by proactively hunting secrets in public S3 buckets. It can be executed as scheduled task or run On-Demand.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/debd82a2b9d84facb758886b73a401e1_scanner_gif.gif\\" alt=\\"scanner_gif.gif\\" /></p>\n<p>Eilon Harel has also put together a blog post diving deeper into this, in the post <ins><a href=\\"https://aws-oss.beachgeek.co.uk/27m\\" target=\\"_blank\\">Hunting After Secrets Accidentally Uploaded To Public S3 Buckets</a></ins></p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/c86d1919c2fe4b1d9ec838ef5b589cf2_0%20Sw-YWjtV0rRFecr4.jpg\\" alt=\\"0 SwYWjtV0rRFecr4.jpg\\" /></p>\n<h3><a id=\\"sandboxaccountsforevents_46\\"></a><strong>sandbox-accounts-for-events</strong></h3>\\n<p><ins><a href=\\"https://aws-oss.beachgeek.co.uk/27q\\" target=\\"_blank\\">sandbox-accounts-for-events</a></ins> “Sandbox Accounts for Events” allows to provide multiple, temporary Amazon Web Services accounts to a number of authenticated users simultaneously via a browser-based GUI. It uses the concept of “leases” to create temporary access tickets and allows to define expiration periods as well as maximum budget spend per leased Amazon Web Services account. Check out the docs for some example uses cases where you might find a tool like this useful, as well as understanding more how this works under the hood.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/767ff650ccaf40d594a330f3d7cc2fc1_table-leases.png\\" alt=\\"tableleases.png\\" /></p>\n<h3><a id=\\"frontenddiscovery_52\\"></a><strong>frontend-discovery</strong></h3>\\n<p><ins><a href=\\"https://aws-oss.beachgeek.co.uk/27r\\" target=\\"_blank\\">frontend-discovery</a></ins> The aim of this project is to define and drive adoption of a frontend discovery pattern, with a primary focus on client-side rendered (CSR), server-side rendered (SSR) and edge-side rendered (ESR) micro-frontends. The frontend discovery pattern improves the development experience when developing, testing, and delivering micro-frontends by making use of a shareable configuration describing the entry point of micro-frontends, as well as additional metadata that can be used to deploy in every environment safely.</p>\n<p>Check out the readme to find out more about the motives behind the project, and dive into it with an example.</p>\n<h3><a id=\\"Demos_Samples_Solutions_and_Workshops_58\\"></a><strong>Demos, Samples, Solutions and Workshops</strong></h3>\\n<h3><a id=\\"mapperforfhir_60\\"></a><strong>mapper-for-fhir</strong></h3>\\n<p><ins><a href=\\"https://aws-oss.beachgeek.co.uk/27p\\" target=\\"_blank\\">mapper-for-fhir</a></ins> FHIR is a standard for health care data exchange, and this repo provides assets that allow for the automated deployment of an HL7v2, leveraging native Amazon Web Services services. The repo provides a CDK application that simplifies deployment.</p>\n<h3><a id=\\"cfsamopenapifileorganizationdemo_64\\"></a><strong>cf-sam-openapi-file-organization-demo</strong></h3>\\n<p><ins><a href=\\"https://aws-oss.beachgeek.co.uk/27w\\" target=\\"_blank\\">cf-sam-openapi-file-organization-demo</a></ins> The project is the API back-end for a widget tracking website. Widget tracking is simplistic - widgets have only a unique name and a colour descriptor for properties. If you want to explore how to approach API design, then this is a good repo to explore.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/f0c5675175e1481380fe504fa93d0c36_Architecture-Diagram.png\\" alt=\\"ArchitectureDiagram.png\\" /></p>\n<h3><a id=\\"samacceleratenestedstacksdemo_70\\"></a><strong>sam-accelerate-nested-stacks-demo</strong></h3>\\n<p><ins><a href=\\"https://aws-oss.beachgeek.co.uk/27v\\" target=\\"_blank\\">sam-accelerate-nested-stacks-demo</a></ins> This repository shows how to use CloudFormation nested stacks with Amazon Web Services SAM Accelerate. Nested stacks are stacks created as part of other stacks. In this demo repo, there are four separate stacks that make up the entire solution. Amazon Web Services SAM manages all four as CloudFormation nested stacks. During development, we show how to use SAM Accelerate to quickly update resources, shortening the development loop.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/1b1e7e05f526495baaca10f744f6f528_orders.png\\" alt=\\"orders.png\\" /></p>\n<h3><a id=\\"decouplingmicroserviceslambdaamazonmqrabbitmq_78\\"></a><strong>decoupling-microservices-lambda-amazonmq-rabbitmq</strong></h3>\\n<p><ins><a href=\\"https://aws-oss.beachgeek.co.uk/27u\\" target=\\"_blank\\">decoupling-microservices-lambda-amazonmq-rabbitmq</a></ins> This project is a solution architecture that demonstrates decoupling micro services with Amazon MQ for RabbitMQ and Amazon Web Services Lambda. A decoupled application architecture allows each component to perform its tasks independently and a change in one service shouldn’t require a change in the other services.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/2f2fafa35f474388991dc8ab011f258a_mq_decoupled_apps.png\\" alt=\\"mq_decoupled_apps.png\\" /></p>\n<h3><a id=\\"howtowritemorecorrectsoftwareworkshop_84\\"></a><strong>how-to-write-more-correct-software-workshop</strong></h3>\\n<p><ins><a href=\\"https://aws-oss.beachgeek.co.uk/27t\\" target=\\"_blank\\">how-to-write-more-correct-software-workshop</a></ins> so last week I featured duvet, a tool to help you codify and automate validation of your software against honouring RFC specs. This repo contains a workshop that walks you through a practical example of that, but also features dafny, a programming language that formally verifies your implementation matches your specification. I have this on my weekend to do list.</p>\n<h2><a id=\\"Amazon_Web_Services_and_Community_blog_posts_88\\"></a><strong>Amazon Web Services and Community blog posts</strong></h2>\\n<h3><a id=\\"Kubernetes_90\\"></a><strong>Kubernetes</strong></h3>\\n<p>Benson Kwong has been busy putting this blog post together, <ins><a href=\\"https://aws-oss.beachgeek.co.uk/27z\\" target=\\"_blank\\">Multi-cluster management for Kubernetes with Cluster API and Argo CD</a></ins> where he introduces what Cluster API is and explained why you can use this useful tool for managing multiple Kubernetes clusters instead of struggling with different APIs and tool sets to maintain them. He also covers how you can also integrate ArgoCD to add that sprinkle of continuous delivery with Git as your source of truth. [hands on]</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/b1fa57b263f848bb936c02e8946bf6b0_kwong.png\\" alt=\\"kwong.png\\" /></p>\n<h2><a id=\\"CoreWCF_96\\"></a><strong>CoreWCF</strong></h2>\\n<p>CoreWCF is a port of the service side of Windows Communication Foundation (WCF) to .NET Core. The goal of this project is to enable existing WCF services to move to .NET Core. In the post, <ins><a href=\\"https://aws-oss.beachgeek.co.uk/27y\\" target=\\"_blank\\">Running your modern CoreWCF application on Amazon Web Services</a></ins> Ashish Bhatia and Jagadeesh Chitikesi show you how to deploy a CoreWCF application on an Amazon Linux Graviton2 instance. [hands on]</p>\n<h3><a id=\\"MySQL_100\\"></a><strong>MySQL</strong></h3>\\n<p>In the post, <ins><a href=\\"https://aws-oss.beachgeek.co.uk/280\\" target=\\"_blank\\">Enable change data capture on Amazon RDS for MySQL applications</a></ins> that are using XA transactions Stanley Chukwuemeke, Baruch Assif Osoveskiy, and Kehinde Otubamowo have collaborated to present a solution to safely replicate change streams from a MySQL database using XA transactions, to down stream OpenSearch. [hands on]</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/7454a19155784c1fbfbdd083fd04e7e2_DBBLOG-2409-arch-diag-1-1.png\\" alt=\\"DBBLOG2409archdiag11.png\\" /></p>\n<h3><a id=\\"Other_posts_and_quick_reads_106\\"></a><strong>Other posts and quick reads</strong></h3>\\n<ul>\\n<li><ins><a href=\\"https://aws-oss.beachgeek.co.uk/27x\\" target=\\"_blank\\">Create a Multi-Region Python Package Publishing Pipeline with Amazon Web Services CDK and CodePipeline</a></ins> walks you through how to deploy a CodePipeline pipeline to automate the publishing of Python packages to multiple CodeArtifact repositories in separate regions [hands on]</li>\n</ul>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/5f72721d3ea045b1849e44a11374588e_devops_2053_1.png\\" alt=\\"devops_2053_1.png\\" /></p>\n<ul>\\n<li><ins><a href=\\"https://aws-oss.beachgeek.co.uk/281\\" target=\\"_blank\\">Microservice observability with Amazon OpenSearch Service part 1: Trace and log correlation</a></ins> is a two part blog that uses a sample micro service to show you how you can implement observability using a number of open source tools such as Fluentbit, Amazon Web Services Distro for OpenTelemetry, and OpenSearch [hands on]</li>\n</ul>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/98a1772b6a1d471b9bf99f010ecd2f3d_BDB-2223-P1-image001.jpg\\" alt=\\"BDB2223P1image001.jpg\\" /></p>\n<ul>\\n<li><ins><a href=\\"https://aws-oss.beachgeek.co.uk/282\\" target=\\"_blank\\">Migrate Oracle hierarchical queries to Amazon Aurora PostgreSQL</a></ins> demonstrates via sample queries how you can migrate Oracle hierarchical queries using a number of keywords to PostgreSQL [hands on]</li>\n<li><ins><a href=\\"https://aws-oss.beachgeek.co.uk/283\\" target=\\"_blank\\">What to consider when modernizing APIs with GraphQL on Amazon Web Services</a></ins> provides a good primer on how GraphQL works and how integrating it with Amazon Web Services services can help you build modern applications</li>\n<li><ins><a href=\\"https://aws-oss.beachgeek.co.uk/284\\" target=\\"_blank\\">Managing Computer Labs on Amazon AppStream 2.0 with Open Source Virtual Application Management</a></ins> provides a hands on guide to using an open source project previously featured in this newsletter, to help administrators programmatically create AppStream 2.0 images [hands on]</li>\n</ul>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/c574c1cc1c824eaaba08f3646ffaa33f_VAM-architecture-diagram.png\\" alt=\\"VAMarchitecturediagram.png\\" /></p>\n<h2><a id=\\"Quick_updates_124\\"></a><strong>Quick updates</strong></h2>\\n<h3><a id=\\"PHP_126\\"></a><strong>PHP</strong></h3>\\n<p>Amazon Web Services App Runner now supports PHP 8.1, Go 1.18, .Net 6, and Ruby 3.1 managed runtimes for building and running web applications and APIs. These runtimes enable you to leverage the App Runner “build from source” capability to build and deploy directly from your source code repository without needing to learn the internals of building and managing your own container images.</p>\n<p>Starting today, you can build and run your services based on PHP 8.1, Go 1.18, .Net 6, and Ruby 3.1 directly from your source code on App Runner. All these new managed runtimes in App Runner are active long-term support (LTS) major versions.</p>\n<h3><a id=\\"Apache_Kafka_132\\"></a><strong>Apache Kafka</strong></h3>\\n<p>Amazon Managed Streaming for Apache Kafka (MSK) now offers Tiered storage that brings a virtually unlimited and low-cost storage tier. Tiered Storage lets you store and process data using the same Kafka APIs and clients , while saving your storage costs by 50% or more over existing MSK storage options . Tiered Storage makes it easy and cost-effective when you need a longer safety buffer to handle unexpected processing delays or build new stream processing applications. You can now scale your compute and storage independently, simplifying operations.</p>\n<p>Amazon MSK Connect also now supports Private DNS hostnames for enhanced security. With Private DNS hostname support in MSK Connect, you can configure connectors to reference public or private domain names. Connectors will use the DNS servers configured in your VPC’s DHCP option set to resolve domain names. You can now use MSK Connect to privately connect with databases, data warehouses and other resources in your VPC to comply with your security needs.</p>\n<h3><a id=\\"Amazon_EMR_138\\"></a><strong>Amazon EMR</strong></h3>\\n<p>Amazon EMR 6.8 has seen a number of updates that you should be aware of.</p>\n<p>PrestoDB and Trino</p>\n<p>With PrestoDB and Trino on EMR 6.8, users benefit from a configuration setting, called the strict mode that prevents cost overruns due to long running queries.Customers have told us that poorly written SQL queries can sometimes run for long times, and consume resources from other business critical workloads. To help administrators take action on such queries, we are introducing strict mode setting that allows warning or rejecting certain types of queries. Examples include queries without predicates on partitioned columns that result in large table scans, or queries that involve cross join between large tables, and/or queries that sort large number of rows without limit. You can set up strict mode configuration during cluster creation and also override the setting using session properties. You can apply strict mode checks for select, insert, create table as select and explain analyse query types.</p>\n<p>We are also excited to announce that Amazon EMR PrestoDB and Trino has added a new features to handle spot interruptions that helps run your queries cost effectively and reliably. Spot Instances in Amazon EMR allows you to run big data workloads on spare Amazon EC2 capacity at a reduced cost compared to On-Demand instances. However, Amazon EC2 can interrupt spot instances with a two-minute notification. PrestoDB/Trino queries fail when spot nodes are terminated. This has meant that customers were unable to run such workloads on spot instances and take advantage of lower costs. In EMR 6.7, we added a new capability to PrestoDB/Trino engine to detect spot interruptions and determine if the existing queries can complete within two minutes on those nodes. If the queries cannot finish, we fail quickly and retry the queries on different nodes. Amazon EMR PrestoDB/Trino engine also does not schedule new queries on spot nodes that are about to be reclaimed. With these two new features, you will get best of both worlds - improved resiliency with PrestoDB/Trino engine on Amazon EMR, and running queries economically on spot nodes.</p>\n<p><em>Hive</em></p>\\n<p>Hive users run Metastore check command with the repair table option (MSCK REPAIR table) to update the partition metadata in the Hive metastore for partitions that were directly added to or removed from the file system (S3 or HDFS). When run, MSCK repair command must make a file system call to check if the partition exists for each partition. This step could take a long time if the table has thousands of partitions. In EMR 6.5, we introduced an optimisation to MSCK repair command in Hive to reduce the number of S3 file system calls when fetching partitions . This feature improves performance of MSCK command (~15-20x on 10k+ partitions) due to reduced number of file system calls especially when working on tables with large number of partitions. Previously, you had to enable this feature by explicitly setting a flag. Starting with Amazon EMR 6.8, we further reduced the number of S3 filesystem calls to make MSCK repair run faster and enabled this feature by default.</p>\n<p>In addition to MSCK repair table optimisation, we also like to share that Amazon EMR Hive users can now use Parquet modular encryption to encrypt and authenticate sensitive information in Parquet files. It is a challenging task to protect the privacy and integrity of sensitive data at scale while keeping the Parquet functionality intact. Data protection solutions such as encrypting files or storage layer are currently used to encrypt Parquet files, however, they could lead to performance degradation. With Parquet modular encryption, you can not only enable granular access control but also preserve the Parquet optimisations such as columnar projection, predicate pushdown, encoding and compression. Using Parquet modular encryption, Amazon EMR Hive users can protect both Parquet data and metadata, use different encryption keys for different columns, and perform partial encryption of only sensitive columns. It also allows clients to check integrity of the data retrieved while keeping all Parquet optimisations.</p>\n<p><em>Apache Flink</em></p>\\n<p>Amazon EMR includes Apache Flink 1.15.1. This feature is available on EMR on EC2.<br />\\nApache Flink is an open source framework and engine for processing data streams. Apache Flink 1.15.1 on EMR 6.8 includes 62 bug fixes, vulnerability fixes, and minor improvements over Flink 1.15.0. Key features include:</p>\n<p>Watermark alignment (Beta) across data sources : Event-time processing in Flink depends on special timestamped elements, called watermarks, that are inserted into the stream either by the data sources or by a watermark generator. A watermark with a timestamp t can be understood as an assertion that all events with timestamps &lt; t have already arrived. Watermark alignment is useful when processing sources with different velocity of events e.g. when one source is idle or one source emits records relatively faster than others, you can enable watermark alignment for each source separately. Flink aligns watermarks by pausing the highest velocity source and continuing to read records from other sources until the watermarks are aligned.<br />\\nSQL version upgrade : Introducing JSON plans which are JSON functions that make it easier to import and export structured data in SQL. Today, version upgrades can alter the topology of SQL queries which can introduce snapshot incompatibility across versions. This makes upgrading Flink versions challenging. With this feature, both the Table API and SQL will provide a way to compile and execute a plan ensuring the same topology for SQL queries throughout different versions, making it more reliable to upgrade to future versions. Users who want to give it a try can create a JSON plan that can then be used to restore a Flink job based on the old operator structure.</p>\n<p><em>Apache Hudi and Apache Iceberg</em></p>\\n<p>Amazon EMR release 6.8 now supports Apache Hudi 0.11.1 and Apache Iceberg 0.14.0. You can use these frameworks on Amazon EMR on EC2, and Amazon EMR on EKS as well as on Amazon EMR Serverless.</p>\n<p>Apache Hudi 0.11.1 on Amazon EMR 6.8 includes support for Spark 3.3.0, adds Multi-Modal Index support and Data Skipping with Metadata Table that allows adding bloom filter and column stats indexes to tables which can significantly improve query performance, adds an Async Indexer service which allows users to create different kinds of indices (e.g., files, bloom filters, and column stats) in the metadata table without blocking ingestion, includes Spark SQL improvements adding support for update or delete records in Hudi tables using non-primary-key fields and Time travel query via timestamp as of syntax, includes Flink integration improvements with support for both Flink 1.13.x and 1.14.x and support for complex data types such as Map and Array etc. In addition, Hudi 0.11.1 includes bug fixes over Hudi 0.11.0 available in Amazon EMR release 6.7.</p>\n<p>Apache Iceberg 0.14.0 on Amazon EMR 6.8 includes support for Spark 3.3.0, adds Merge-on-read support for MERGE and UPDATE statements, adds support to rewrite partitions using Z-order that allows to re-organize partitions to be efficient with query predicates on multiple columns and also to keep similar data together, includes several performance improvements for scan planning in Spark queries, add support for row group skipping using Parquet bloom filters, etc.</p>\n<h3><a id=\\"Amazon_Web_Services_ParallelCluster_170\\"></a><strong>Amazon Web Services ParallelCluster</strong></h3>\\n<p>Amazon Web Services ParallelCluster 3.3 is now generally available and introduces a new feature for compute resource optimisation. With this new feature, you can map a compute resource to a list of Amazon EC2 instance types with an allocation strategy to optimize compute capacity for your HPC jobs. Other features include updates that support dynamically mounting shared storage, Slurm accounting, and Amazon EC2 on demand capacity reservations (ODCR).</p>\n<p>Check out the announcement for more details, <ins><a href=\\"https://aws-oss.beachgeek.co.uk/27h\\" target=\\"_blank\\">Amazon Web Services ParallelCluster 3.3: multiple instance type allocation and other top requested features</a></ins></p>\n<p>Matt Vaughn has also published a blog post, <ins><a href=\\"https://aws-oss.beachgeek.co.uk/27i\\" target=\\"_blank\\">Support for Instance Allocation Flexibility in Amazon Web Services ParallelCluster</a></ins> 3.3, where he explains in detail how a new feature that was also announced called “multiple instance type allocation” in ParallelCluster 3.3.0. This feature enables you to specify multiple instance types to use when scaling up the compute resources for a Slurm job queue. Your HPC workloads will have more paths forward to get access to the EC2 capacity they need, helping you to get more computing done.</p>\n<h2><a id=\\"Videos_of_the_week_179\\"></a><strong>Videos of the week</strong></h2>\\n<h3><a id=\\"Firecracker_181\\"></a><strong>Firecracker</strong></h3>\\n<p>In this lightening talk from KubeCon, Richard Case from SUSE brings you up to speed with what microVMs are, how they can be useful and provides plenty of examples. This is great stuff for having a better understanding of then how you might use Firecracker, an open source microVM project from Amazon Web Services.</p>\n<p><video src=\\"https://dev-media.amazoncloud.cn/9e0b0ec0676445b4b7cb91376288ef25_Lightning%20Talk%EF%BC%9A%20What%20Are%20MicroVMs%EF%BC%9F%20And%20Why%20Should%20I%20Care%EF%BC%9F%20-%20Richard%20Case%2C%20SUSE.mp4\\" controls=\\"controls\\"></video></p>\\n<h3><a id=\\"Babelfish_for_Aurora_PostgreSQL_188\\"></a><strong>Babelfish for Aurora PostgreSQL</strong></h3>\\n<p>Join fellow Developer Advocate John Russell as he shows you how to set up a database server using the combination of Babelfish and Aurora, connect to the database, and run both PostgreSQL and T-SQL statements. Babelfish for PostgreSQL is an open source project that provides a compatibility layer for the SQL dialect (T-SQL) used by Microsoft SQL Server. Babelfish helps migrate database applications onto PostgreSQL with minimal changes to the application code.</p>\n<p><video src=\\"https://dev-media.amazoncloud.cn/0566de9e8c594c0ea93ae67579a3463c_Get%20Up%20and%20Running%20with%20Babelfish%20for%20Aurora%20PostgreSQL%20%EF%BD%9C%20Amazon%20Web%20Services.mp4\\" controls=\\"controls\\"></video></p>\\n<h3><a id=\\"Build_on_Open_Source_194\\"></a><strong>Build on Open Source</strong></h3>\\n<p>For those unfamiliar with this show, Build on Open Source is where we go over this newsletter and then invite special guests to dive deep into their open source project. Expect plenty of code, demos and hopefully laughs.</p>\n<p>We have put together a playlist so that you can easily access all the other episodes of the Build on Open Source show. <ins><a href=\\"https://aws-oss.beachgeek.co.uk/24u\\" target=\\"_blank\\">Build on Open Source playlist</a></ins></p>\n<h2><a id=\\"Events_for_your_diary_201\\"></a><strong>Events for your diary</strong></h2>\\n<h3><a id=\\"OpenSearch__Development_Backlog__Triage_Meeting_Security_203\\"></a><strong>OpenSearch - Development Backlog &amp; Triage Meeting Security</strong></h3>\\n<h3><a id=\\"7th_November__12pm_PT_204\\"></a><strong>7th November - 12pm PT</strong></h3>\\n<p>The OpenSearch engineering team working on the Security repo have opened up their Backlog &amp; Triage meetings to the public. This is a great opportunity to find out more about the inner workings of open source projects such as OpenSearch. Don’t worry if you cannot make this meeting as they are currently scheduled from the 7th of November out through Dec 19th.</p>\n<p>Check out the entire <ins><a href=\\"https://aws-oss.beachgeek.co.uk/285\\" target=\\"_blank\\">list here</a></ins>.</p>\n<h3><a id=\\"Open_Source__Amazon_Web_Services_IoT_210\\"></a>Open Source &amp; Amazon Web Services IoT</h3>\\n<h3><a id=\\"7th_November_4pm_IST_211\\"></a>7th November, 4pm IST</h3>\\n<p>Internet of Things and other categories of hardware devices with edge computing capabilities that communicate over the internet to perform remote actions require a stable, easy-to-use and, most importantly, secure operating software that can be audited independently. In this session, the presenter will look at ways that Amazon Web Services supports and contributes to the open-source community to make these devices more resilient and feature-rich through solutions such as Amazon Web Services Greengrass and Amazon Web Services IoT Core.</p>\n<p>Amazon Web Services Hero Faizal Khan will be your host, and you can sign up <ins><a href=\\"https://aws-oss.beachgeek.co.uk/27o\\" target=\\"_blank\\">via the registration page here.</a></ins></p>\n<h3><a id=\\"Open_Source_Edinburgh_217\\"></a><strong>Open Source Edinburgh</strong></h3>\\n<h3><a id=\\"9th_November_530pm_Scott_Logic_in_Edinburgh_218\\"></a><strong>9th November, 5:30pm Scott Logic in Edinburgh</strong></h3>\\n<p>I will be talking at the Open Source Edinburgh meet-up this week, and you can find details on the location and how to reserve your spot by clicking on the <ins><a href=\\"https://www.meetup.com/open-source-edinburgh/events/289090920/\\" target=\\"_blank\\">meetup.com link</a></ins>. Hope to see some of you there.</p>\n<h2><a id=\\"Running_Open_Source_Transcoding_Server_on_Amazon_EKS_222\\"></a>Running Open Source Transcoding Server on [Amazon EKS](https://aws.amazon.com/cn/eks/?trk=cndc-detail)</h2>\\n<h3><a id=\\"Friday_18th_1900_WIB_223\\"></a><strong>Friday, 18th, 19:00 WIB</strong></h3>\\n<p>Join Beny Ibrani and the Amazon Web Services User Group Indonesia for this session (local language I believe) where Beny will show you how you can use open source transcoding software running on Amazon EKS.</p>\n<p>This session will be streamed on YouTube, so <ins><a href=\\"https://aws-oss.beachgeek.co.uk/27j\\" target=\\"_blank\\">check it out here</a></ins></p>\n<h3><a id=\\"Build_on_Amazon_Web_Services_Open_Source_230\\"></a><strong>Build on Amazon Web Services Open Source</strong></h3>\\n<h3><a id=\\"November_18th_9am_BST_231\\"></a><strong>November 18th, 9am BST</strong></h3>\\n<p>Join us for the sixth episode of the Build on Amazon Web Services series, featuring a live round up of the latest projects and news as well as a special guest speaker. We have another special guest lined up, and we will announce this next week. Follow the show on @buildonopen for more details. Check it out on <ins><a href=\\"https://twitch.tv/aws\\" target=\\"_blank\\">https://twitch.tv/aws</a></ins></p>\n<h3><a id=\\"Amazon_Web_Services_Elastic_Kubernetes_Service_EKS_Workshop_235\\"></a><strong>Amazon Web Services Elastic Kubernetes Service (EKS) Workshop</strong></h3>\\n<h3><a id=\\"November_10th_London_5pm_236\\"></a><strong>November 10th, London 5pm</strong></h3>\\n<p>Join us for an interactive workshop on containers, Docker, Fargate and Amazon EKS, hosted by ClearScale and Amazon Web Services. This live, virtual workshop includes three hours of interactive presentation and hands-on lab work. You will take part in the setup and deployment of containers using EKS. Follow along and work directly with Amazon Web Services professionals and ClearScale (an Amazon Web Services Premier Tier Services Partner) in this Level 200 training session.</p>\n<p>You can find out more about this event by <ins><a href=\\"https://aws-oss.beachgeek.co.uk/22y\\" target=\\"_blank\\">checking out the event page and signing up.</a></ins></p>\n<h3><a id=\\"reInvent_242\\"></a><strong>re:Invent</strong></h3>\\n<h3><a id=\\"November_28th__December_3rd_Las_Vegas_243\\"></a><strong>November 28th - December 3rd, Las Vegas</strong></h3>\\n<p>re:Invent is only a few weeks away so I want to share a few things that will hopefully be of interest.</p>\n<p>First up, we will be running the Build On Live stream throughout re:Invent and we would love to feature you! If either yourself, or perhaps you know a community member going to re:Invent and think they will absolutely love to attend the livestream, we want to hear from you. Please nominate a community member you want to hear from during Build On Live <ins><a href=\\"https://eventbox.dev/survey/6B0ED1J\\" target=\\"_blank\\">using this survey.</a></ins></p>\n<p>Second, check out this handy way to look at all the amazing open source sessions, then check out this <ins><a href=\\"https://aws-oss.beachgeek.co.uk/252\\" target=\\"_blank\\">dashboard</a></ins> [sign up required]. I would love to hear which ones you are excited about so please let me know in the comments or via Twitter. If you want to hear what my top three, must watch sessions, then this is what I would attend (sadly, as an Amazon Web Services employee I am not allowed to attend sessions)</p>\n<ol>\\n<li>OPN306 Amazon Web Services Lambda Powertools: Lessons from the road to 10 million downloads - Heitor Lessa is going to deliver an amazing session on the journey from idea to one of the most loved and used open source tools for Amazon Web Services Lambda users<br />\\n2.BOA204 When security, safety, and urgency all matter: Handling Log4Shell - Cannot wait for this session from Abbey Fuller who will walk us through how we managed this incident</li>\n<li>OPN202 Maintaining the Amazon Web Services Amplify Framework in the open - Matt Auerbach and Ashish Nanda are going to share details on how Amplify engineering managers work with the OSS community to build open-source software</li>\n</ol>\\n<p>There are many other great open source sessions, and hopefully I will try and put together a more comprehensive list as approach re:Invent.</p>\n<h3><a id=\\"OpenSearch_259\\"></a>OpenSearch</h3>\\n<h3><a id=\\"Every_other_Tuesday_3pm_GMT_260\\"></a>Every other Tuesday, 3pm GMT</h3>\\n<p>This regular meet-up is for anyone interested in OpenSearch &amp; Open Distro. All skill levels are welcome and they cover and welcome talks on topics including: search, logging, log analytics, and data visualisation.</p>\n<p>Sign up to the next session, <ins><a href=\\"https://aws-oss.beachgeek.co.uk/1az\\" target=\\"_blank\\">OpenSearch Community Meeting</a></ins></p>\n<h2><a id=\\"Stay_in_touch_with_open_source_at_Amazon_Web_Services_266\\"></a><strong>Stay in touch with open source at Amazon Web Services</strong></h2>\\n<p>I hope this summary has been useful. Remember to check out the <ins><a href=\\"https://aws.amazon.com/opensource/?opensource-all.sort-by=item.additionalFields.startDate&amp;opensource-all.sort-order=asc\\" target=\\"_blank\\">Open Source homepage</a></ins> to keep up to date with all our activity in open source by following us on <ins><a href=\\"https://twitter.com/AWSOpen\\" target=\\"_blank\\">@AWSOpen</a></ins></p>\n"}
0
目录
关闭