Amazon Web Services open source newsletter, #172

Amazon Simple Storage Service (S3)
Amazon EMR
Amazon Kendra
海外精选的内容汇集了全球优质的亚马逊云科技相关技术内容。同时,内容中提到的“AWS” 是 “Amazon Web Services” 的缩写,在此网站不作为商标展示。
## September 4th, 2023 - Instalment #172 Welcome to #172 of the Amazon Web Services open source newsletter, your reliable source for all open source on Amazon Web Services goodness. What do we have for you this week? Well, more new projects to check out, and plenty of fresh content on the open source projects you all love. We have tools to help you export your DynamoDB tables as csv files, a tool that goes beyond tracking cost and actually shuts down resources to help you manage your Amazon Web Services budget, a cool dashboard to help you stay on top of your EC2 configurations, a couple of useful utilities to simplify working with files on [Amazon S3](, and then a sample Cedar project that helps you implement a Lambda authoriser. Also featured in this edition is content covering open source technologies including Amazon Web Services SAM, cedarpy, Cedar, Amazon Lambda SnapStart, GraalVM, OpenSearch, Lustre, Kubernetes, MariaDB, MySQL, PostgreSQL, Amazon Amplify, GitLab, Next.js, Amazon ParallelCluster, Apache Spark, [Amazon EMR](, Apache Flink, Apache Airflow, Kyverno, CDK8s, sudo-rs, and Sphinx. Also, be sure to check out the events section as there are a few events happening this week. **Feedback** Before you dive in however, I need your help! Please please please take 1 minute to [complete this short survey]( and you will forever have my gratitude! ### Celebrating open source contributors The articles and projects shared in this newsletter are only possible thanks to the many contributors in open source. I would like to shout out and thank those folks who really do power open source and enable us all to learn and build on top of what they have created. So thank you to the following open source heroes: Iliyas Maner, Sonia García-Ruiz, k.goto, Stephen Kuenzli, Rehan van der Merwe, Josh Aas, Julian Michel, Olawale Olaleye, Shuting Zhao, Abhishek Gupta, Suman Debnath, Elliott Cordo, Channy Yun, Le Clue Lubbe, Munish Dabra, Lucas Vieira Souza da Silva, Rajiv Upadhyay, Abdallah Shaban, Srinivas Jasti, Sheetal Joshi, Raj Ramasubbu, Brandon Carroll, Stephen Kuenzli, Vadym Kazulkin, and Brad Knowles ### Latest open source projects *The great thing about open source projects is that you can review the source code. If you like the look of these projects, make sure you that take a look at the code, and if it is useful to you, get in touch with the maintainer to provide feedback, suggestions or even submit a contribution. The projects mentioned here do not represent any formal recommendation or endorsement, I am just sharing for greater awareness as I think they look useful and interesting!* #### Tools **dynamodump** [dynamodump]( is a new tool from Iliyas Maner that provides a simple way to dump your [Amazon DynamoDB]( table contents to a comma separated value file. **cdk-cost-limit** [cdk-cost-limit]( is a Collection of CDK Constructs to deploy Cost-Aware Self-Limiting Resources. This package lets you set spending limits on Amazon Web Services. While existing Amazon Web Services solutions merely alert, this library disables resources, using non-destructive operations, when budgets are hit. This library includes an Aspect and a collection of Amazon Web Services CDK Level-2 Constructs. They deploy additional resources to compute real-time spending and halt resources when budgets are met (e.g. Lambda Functions reserved concurrency is set to 0). Check out the README for important details around how this works, and the potential impact for your applications. The project is looking for feedback, so take a look and let them know what you think. **ec2-flexibility-score-dashboard** [ec2-flexibility-score-dashboard]( is a nice project that helps you to assess any configuration used to launch instances through an Auto Scaling Group (ASG) against the recommended EC2 best practices. It converts the best practice adoption into a “flexibility score” that can be used to identify, improve, and monitor the configurations (and subsequently, overall organisation level adoption of Spot best practices) which may have room to improve the flexibility by implementing architectural best practices. The following illustration shows the EC2 Flexibility Score Dashboard: ![image.png]( "image.png") **aws-s3-integrity-check** [aws-s3-integrity-check]( this simple tool from Sonia García-Ruiz provides a Bash script to check the md5 integrity of a set of files that have previously been uploaded into an [Amazon S3]( bucket. Detailed README on how this works together with plenty of examples and some limitations you should be aware of. **cls3** [cls3]( is a very handy tool from Amazon Web Services Community Builder k.goto that helps you to CLear S3 Buckets. It empties (so deletes all objects and versions/delete-markers in) S3 Buckets or deletes the buckets themselves. You can check out the supporting blog post, [Tool for fast deletion and emptying of S3 buckets (versioning supported)]( ### Demos, Samples, Solutions and Workshops **cedarpy-example-hello-photos** [cedarpy-example-hello-photos]( this is a sample project that Amazon Web Services Community Builder Stephen Kuenzli that provides and example of how to build a Lambda Authorizer using Cedar Policy and cedarpy. If you check out the Videos section below, you can check out the Twitch session that Stephen did with my colleague Brandon that walks you through this demo. **kendra_retriever_samples** [kendra_retriever_samples]( This repo contains a number of example code samples and supporting CloudFormation templates that help you work with Langchain and [Amazon Kendra]( It currently has samples for working with a Kendra retriever class to execute a QA chain for SageMaker, Open AI and Anthropic providers. To help you deploy this code and help you understand how it all works, you can follow along the blog post, [Deploy self-service question answering with the QnABot on Amazon Web Services solution powered by Amazon Lex with Amazon Kendra and large language models]( ![image.png]( "image.png") ### Amazon Web Services and Community blog posts **Community round up** We have another great selection of community originated content this week, covering a broad set of open source technologies. First up is Amazon Web Services Hero Rehan van der Merwe taking a look at TypeScript Remote Procedure Call (or better known as tRPC). What is it I can hear you all asking. Over-fetching and under-fetching are common issues with RESTful APIs. Like GraphQL, tRPC allows you to use TypeScript to define and get only the data you need avoiding bloated responses and duplicate requests. Rehan dives deep in his post, [Amazon Lambda with tRPC and separate repos using OpenAPI]( that provides a detailed, hands on guide on how you can use tRPC, trpc-openapi (OpenAPI support for tRPC), and Amazon CDK to deploy this on Amazon Lambda. Be sure to check out the other posts Rehan has been publishing on this topic. Josh Aas shared details of the first stable release of sudo-rs, a Rust rewrite of the critical sudo utility, in his post [The First Stable Release of a Memory Safe sudo Implementation]( This is a good example of the ongoing commitment from Amazon Web Services to supporting the work of the Internet Security Research Group (ISRG) to improve the memory safety of critical open source tools used by developers. Sphinx is a great tool for writing documentation, and something that I first got to grips with when contributing to the Apache Airflow project (which I blogged about a while back). Amazon Web Services Community Builder Julian Michel has put together [How to automatically release Sphinx documentation using CDK Pipelines and a custom CodeBuild image]( that describes how to publish Sphinx projects using CDK pipelines. Very nice indeed. ![image.png]( "image.png") Next up is Olawale Olaleye with his post, [Building an Amazon EKS Cluster Preconfigured to Run High Traffic Microservices]( which is a nice tutorial that shows you how you can deploy high traffic Kubernetes workloads on [Amazon EKS]( Staying in Cloud Native land, we have Shuting Zhao who wrote, [Verifying images in a private Amazon ECR with Kyverno and IAM Roles for Service Accounts (IRSA)]( that shows how you can securely verify your container images using Kyverno, a CNCF policy engine designed for Kubernetes. Make sure you read this one. To wrap up the Kubernetes content in this section we have my colleague Abhishek Gupta who has put togehter [Simplifying Your Kubernetes Infrastructure With CDK8s](, that shares details from his talk on how you can use CDK for Kubernetes, or CDK8s, an open-source CNCF project that helps represent Kubernetes resources and application as code (not YAML!). There is not enough content on this project, so make sure you check that out too. To finish up this weeks community round up, we have a couple of data related posts. First up is my good friend Suman Debnath who has put together, [The Ultimate Guide to Running Apache Spark on Amazon Web Services]( where he helps you explore the various decision-making questions to help developers navigate the options and choose the most suitable Amazon Web Services service for your Spark workloads. Finally we have Amazon Web Services Hero Elliott Cordo who writes about one of my favourite open source projects, Apache Airflow, in [The Wrath of Unicron - When Airflow Gets Scary]( And whilst this does sound like an episode of Star Trek, I can assure you it is well worth reading as it provides a nice approach on how you can SNS and SQS to orchestrate your workflows across orchestrators (multiple Airflow environments). Whilst this might not be suitable for every use case, posts like this provide useful ideas to keep in your back pocket should the need arise. **Apache Flink** I was super happy with the announcement last week that we renamed [Amazon Kinesis]( Data Analytics to Amazon Managed Service for Apache Flink. The name change is effective in the Amazon Web Services Management Console, documentation, and service webpages. There are no other changes, including to service endpoints, APIs, the Amazon Web Services Command Line Interface (Amazon CLI), the Amazon Identity and Access Management (IAM) access policies, [Amazon CloudWatch]( metrics, or the Amazon Web Services Billing console dashboard. Your existing applications will continue to work as they did previously. My colleague Channy Yun has put together everything you need to know in the blog post, [Announcing Amazon Managed Service for Apache Flink Renamed from Amazon Kinesis Data Analytics]( ![image.png]( "image.png") **Apache Spark** In [Monitor Apache Spark applications on Amazon EMR with Amazon Cloudwatch]( Le Clue Lubbe demonstrates how to publish detailed Spark metrics from [Amazon EMR]( to [Amazon CloudWatch]( By default, [Amazon EMR]( sends basic metrics to CloudWatch to track the activity and health of a cluster. Spark’s configurable metrics system allows metrics to be collected in a variety of sinks, including HTTP, JMX, and CSV files, but additional configuration is required to enable Spark to publish metrics to CloudWatch. Read this post to see how you can configure those metrics, and produce nice. looking dashboards in [Amazon CloudWatch]( \[hands on] ![image.png]( "image.png") There was more content on this topic last week, and in Monitor your [Databricks Clusters with Amazon Web Services managed open-source Services]( Munish Dabra, Lucas Vieira Souza da Silva, and Rajiv Upadhyay explored how you can leverage Amazon Web Services managed open-source services to monitor your Apache Spark workloads running on Databricks clusters. \[hands on] ![image.png]( "image.png") **Other posts and quick reads** * [Detect real users with Amazon Amplify and Face Liveness]( provides a nice demo and walkthrough of how to setup the Amplify Face Liveness component using Next.js and a REST API, an Amazon Amplify component that helps verify if your app is being used by real users \[hands on] * [Bursting your HPC applications to Amazon Web Services is now easier with Amazon File Cache and Amazon Web Services ParallelCluster]( walks you through the features of File Cache (a high-speed cache on Amazon Web Services to facilitate efficient file data processing, regardless of its storage location) that are important for HPC environments, and shows you, step-by-step, how you can quickly deploy this and try it out for yourself \[hands on] ![image.png]( "image.png") * [Monitor real-time Amazon RDS OS metrics with flexible granularity using Enhanced Monitoring]( provides more monitoring goodness this week, showing you how you can use a feature of [Amazon RDS]( called Enhanced Monitoring, that provides an additional layer of telemetry, which can be useful during investigations that require highly granular monitoring data \[hands on] * [Generate security insights from Amazon Security Lake data using Amazon OpenSearch Ingestion]( explores how you can use the Open Cybersecurity Schema Framework, an open standard for storing security events in a common and shareable format, in combination with OpenSearch to generate actionable insights from your security events \[hands on] ![image.png]( "image.png") * [Setting Up OpenID Connect with GitLab CI/CD to Provide Secure Access to Environments in Amazon Web Services Accounts]( provides a breakdown between three methods for connecting GitLab CI/CD pipelines to Amazon Web Services, that as your team implements new CI/CD pipelines with Amazon Web Services or hardens the security of existing pipelines, you can evaluate the tradeoffs based on your organisational goals and security needs. \[hands on] ![image.png]( "image.png") ### Quick updates **PostgreSQL** Amazon Relational Database Service (RDS) for PostgreSQL now supports the Rust programming language as a new trusted procedural language in PostgreSQL major versions 13 and 14, expanding support for Rust from major version 15. This helps you build high performance user-defined functions to extend PostgreSQL for compute-intensive data processing. Rust combines the performance and resource efficiency of compiled languages like C with mechanisms that limit the risks from unsafe memory use. As a PostgreSQL trusted procedural language, PL/Rust provides memory safety so that an unprivileged user can run code in the database with minimal risk of crashing the database due to a software defect that corrupts memory. Developers can also package PL/Rust code as a Trusted Language Extension for PostgreSQL to run on [Amazon RDS]( PL/Rust version 1.2.3 with crate support for aes, ctr, and rand is available on database instances in [Amazon RDS]( running PostgreSQL 13.12 and higher, PostgreSQL 14.9 and higher, and 15.2-R2 and higher in all applicable Amazon Web Services Regions **Amazon Amplify** Android, Swift, and Flutter libraries now support Time-Based One-time Passwords (TOTP) as a multi-factor authentication (MFA) method. This feature enables developers to provide their users with a secure option for validating a user’s identity after they provide their username and password. Users of apps with TOTP enabled can register their apps with an Authenticator app such as Google Authenticator, Authy, or the Microsoft Authenticator app. After a user provides their username or password, they would then be presented with a challenge to complete their sign in by providing the code generated by their Authenticator app. Check out the blog post [Amazon Amplify supports Time-Based One-Time Password (TOTP) for MFA on Android, Swift, and Flutter](, where Abdallah Shaban provides a hands on guide through this new feature. \[hands on] ![totp-flutter.gif]( "totp-flutter.gif") **Kubernetes** The [Amazon VPC]( Container Networking Interface (CNI) Plugin now supports the Kubernetes NetworkPolicy resource. Customers can use the same open-source [Amazon VPC]( CNI to implement both pod networking and network policies to secure the traffic in their Kubernetes clusters. This reduces the need to run additional software for network access controls and will work alongside all existing VPC CNI capabilities. By default, in Kubernetes, any pod can talk to any other pod within a cluster with no restriction. For better network isolation, Kubernetes NetworkPolicy allows cluster administrators to secure access to and from applications by defining which entities a pod is allowed to communicate with and vice-versa. However, this requires customers to use additional software to implement NetworkPolicy, often resulting in operational overhead and cost to install and maintain those third party plugins. With support for NetworkPolicy in [Amazon VPC]( CNI, customers running Kubernetes on Amazon Web Services can now allow or deny traffic between their pods based on label selectors, namespaces, IP blocks, and ports with minimal overhead. With native VPC integration, they can secure their applications using standard components including security groups, and network access control lists (ACL), as part of additional defence-in-depth measures. In addition, customers can trace and troubleshoot configured policies at a cluster and node level using the [Amazon VPC]( CNI plugin. Starting with VPC CNI v1.14, NetworkPolicy support is available on new clusters running Kubernetes version 1.25 and above but turned off by default at launch. Srinivas Jasti and Sheetal Joshi have put together a detailed blog post, [Amazon VPC CNI now supports Kubernetes Network Policies]( where they demonstrate how you can enforce fine-grained control over communication, isolate workloads, and enhance the overall security of your Amazon Web Services Kubernetes clusters—all without the need to manage third-party network policy plugins. **MySQL and MariaDB** Amazon Relational Database Service ([Amazon RDS]( Optimized Writes now supports m6i and m6g database (DB) instances. With [Amazon RDS]( Optimized Writes you can improve the write throughput for [Amazon RDS]( for MySQL and MariaDB workloads by up to 2x at no additional cost. This is especially useful for write-intensive database workloads, commonly found in applications such as digital payments, financial trading, and online gaming. In MySQL or MariaDB, you are protected from data loss due to unexpected events, such as a power failure, using a built-in feature called the “doublewrite buffer” that takes up to twice as long, consumes twice as much I/O bandwidth, and reduces the throughput and performance of your database. [Amazon RDS]( Optimized Writes protects you from data loss by writing only once. With Optimized Writes you can improve write throughout by up to 2x at no additional cost. [Amazon RDS]( Optimized Writes is available as a default option from RDS for MySQL version 8.0.30 and higher, and RDS for MariaDB version 10.6.10 and higher. **Lustre** [Amazon FSx for Lustre](, a fully managed service that makes it easy and cost effective to launch, run, and scale the world’s most popular high-performance file system, now supports project quotas. With project quotas, you can group multiple files or directories on your file system into a project, and monitor storage consumption on a per-project basis. Project quotas are ideal for storage administrators who manage file systems that serve multiple projects or teams who want to ensure that no project exceeds its allocated storage capacity. Until today, you could set and enforce user- and group-level storage consumption using user quotas and group quotas. With project quotas, you can also set and enforce storage limits based on the number of files or storage capacity consumed by a specific project. You can set a hard limit to prevent projects from consuming additional storage after exceeding their project quota, or set a soft limit that provides users with a grace period to complete their workloads before converting into a hard limit. \ Support for project quotas is now available at no additional cost on all [Amazon FSx for Lustre]( file systems running on Lustre version 2.15. **OpenSearch** Amazon Web Services User Notifications lets you centrally setup and view notifications from Amazon Web Services services, such as [Amazon OpenSearch Service](, Amazon Web Services Health events, [Amazon CloudWatch]( alarms, or EC2 Instance state changes, in a consistent, human-friendly format. Last week it was announced that you could now integrate Amazon OpenSearch Serverless with Amazon Web Services User Notifications. OpenSearch Serverless is the serverless option for [Amazon OpenSearch Service]( that makes it simple for you to run search and analytics workloads without having to think about infrastructure management. If you are looking for more details on how you might implement this, then Raj Ramasubbu has you covered in his post, [Monitoring Amazon OpenSearch Serverless using Amazon Web Services User Notifications]( ![image.png]( "image.png") ### Videos of the week **Building a simple Lambda Authorizer using cedarpy** Join my colleague Brandon Carroll and Amazon Web Services Community Builder Stephen Kuenzli as they take a look at building an Amazon Lambda authoriser using Stephen's open source project, cedarpy. This is something I featured in last weeks newsletter, and that I used my self in my own Python based application. Check it out over at Twitch, [on this link](\\&sort=time?trk=cndc-detail). **How to reduce cold starts for Java Serverless applications in Amazon Web Services** Check out Vadym Kazulkin's session at FroSCon as he looks at the best practices, features and possibilities Amazon Web Services offers for the Java developers to reduce the cold start times like GraalVM Native Image and Amazon Lambda SnapStart based on CRaC (Coordinated Restore at Checkpoint) project. **Level Up with Amazon Web Services SAM: The Ultimate Serverless Toolkit!** One for all you .NET developers, Brad Knowles takes you on a journey of introducing the Amazon Serverless Application Model (SAM) toolkit into your serverless development workflow. You will see how you can take a C# .NET API from File...New Project to a fully deployed Amazon Web Services application using [Amazon API Gateway](, Amazon Lambda, and [Amazon DynamoDB]( While it may seem like magic, Brad digs into the details to demystify that magic so you walk away with a full understanding of the process. No application development journey is complete without testing. Amazon SAM has you covered here as well. This session explores SAM's secret weapon to aid in keeping that dev-test-deploy feedback loop as tight as possible. Great stuff! **Open Source Brief** Now featured every week in the Amazon Web Services Community Radio show, grab a quick five minute recap of the weekly open source newsletter from yours truly. Last week's issue is featured in this video. Check out the [playlist here]( **Build on Open Source** For those unfamiliar with this show, Build on Open Source is where we go over this newsletter and then invite special guests to dive deep into their open source project. Expect plenty of code, demos and hopefully laughs. We have put together a playlist so that you can easily access all (sixteen) of the episodes of the Build on Open Source show. [Build on Open Source playlist]( We are currently planning the third series - if you have an open source project you want to talk about, get in touch and we might be able to feature your project in future episodes of Build on Open Source. # Events for your diary This week, check out the Developer Webinar series, where we have three great open source topics for you. It is online, so there is still time for you to check it out. If you are planning any events in 2023, either virtual, in person, or hybrid, get in touch as I would love to share details of your event with readers. **Developer Webinar Series, Open Source At Amazon Web Services**\ **Online, 7th September 11am - 2pm AEST** As part of the Developer Webinar series, we are delighted to showcase three sessions that look at open source on Amazon Web Services. We have Aish Gunasekar who will be talking about "Leveraging OpenSearch for Security Analytics". I will be doing a talk on Cedar, in my session "Next generation Authz with Cedar", and to wrap things up we have Keita Watanabe who will be looking at "Scaling LLM/GenAI deployment with NVIDIA Triton on [Amazon EKS](". The sessions are technical deep dives, and there will be Q\&A as well. Jump over to the [registration page and sign up](, and hope to see many of you there. **Building ML capabilities with PostgreSQL and pgvector extension**\ **YouTube, 14th September 4pm UK time** Generative AI and Large Language Models (LLMs) are powerful technologies for building applications with richer and more personalized user experiences. Application developers who use [Amazon Aurora]( for PostgreSQL or [Amazon RDS]( for PostgreSQL can use pgvector, an open-source extension for PostgreSQL, to harness the power of generative AI and LLMs for driving richer user experiences. Register now to learn more about this powerful technology. Watch it [live on YouTube]( **Build ML into your apps with PostgreSQL and the pgvector extension**\ **YouTube, 21st September 4pm UK time** This office hours session is a follow up for those who attended the fireside chat titled "Building ML capabilities into your apps with PostgreSQL and the open-source pgvector extension". Others are also welcome. Office hours attendees can ask questions related to this topic. Application developers who use [Amazon Aurora]( for PostgreSQL or [Amazon RDS]( for PostgreSQL can use pgvector, an open-source extension for PostgreSQL, to harness the power of generative AI and LLMs for driving richer user experiences. Join us to ask your questions and hear the answers to the most frequently asked questions about the pgvector extension for PostgreSQL. Watch it [live on YouTube]( **Open Source Summit, Europe**\ **September 19th-21st, Bilboa Spain** "Open Source Summit is the premier event for open source developers, technologists, and community leaders to collaborate, share information, solve problems, and gain knowledge, furthering open source innovation and ensuring a sustainable open source ecosystem. It is the gathering place for open-source code and community contributors." You will find Amazon Web Services as well as myself at Open Source Summit this year, so come by the Amazon Web Services booth and say hello - from the glimpses I have seen so far, it is going to be awesome! Find out more at the official site, [Open Source Summit Europe 2023]( **OpenSearchCon**\ **Seattle, September 27-29, 2023** Registration is now open source OpenSearchCon. Check out this post from Daryll Swager, [Registration for OpenSearchCon 2023 is now open!]( that provides you with what you can expect, and resources you need to help plan your trip. **CDK Day, 2023**\ **Online, 29th September 2023** Back for the fourth instalment, this Community led event is a must attend for anyone working with infrastructure as code using the Amazon Cloud Development Kit (CDK). It is intended to provide learning opportunities for all users of the CDK and related libraries. The event will be live streamed on YouTube, and you check more at the website, [CDK Day]( **Open Source India**\ **October 12-13th, NIMHANS Convention Center, Bengaluru** One of the most important open source events in the region, Open Source India will be welcoming thousands of attendees all to discuss and learn about open source technologies. I will be there too, doing a talk so I would love to meet with any of you who are also planning on attending. Check out more details on their web page, [here]( **All Things Open**\ **October, 15th-17th, Raleigh Convention Center, Raleigh, North Carolina** I will be attending and speaking at All Things Open, looking at Apache Airflow as an container orchestrator. I will be there with a bunch of fellow Amazon Web Services colleagues, and I hope to meet some of you there. Check us out at the Amazon Web Services booth, where you will find me and the other Amazon Web Services folk throughout the event. Check out the event and sessions/speakers at the official webpage for the event, [AllThingsOpen 2023]( **Cortex**\ **Every other Thursday, next one 16th February** The Cortex community call happens every two weeks on Thursday, alternating at 1200 UTC and 1700 UTC. You can check out the GitHub project for more details, go to the [Community Meetings]( section. The community calls keep a rolling doc of previous meetings, so you can catch up on the previous discussions. Check the [Cortex Community Meetings Notes]( for more info. **OpenSearch**\ **Every other Tuesday, 3pm GMT** This regular meet-up is for anyone interested in OpenSearch & Open Distro. All skill levels are welcome and they cover and welcome talks on topics including: search, logging, log analytics, and data visualisation. Sign up to the next session, [OpenSearch Community Meeting]( ### Stay in touch with open source at Amazon Web Services Remember to check out the [Open Source homepage](\\&opensource-all.sort-order=asc?trk=cndc-detail) to keep up to date with all our activity in open source by following us on [@AWSOpen](
亚马逊云科技解决方案 基于行业客户应用场景及技术领域的解决方案