Improved performance for lightweight transactions with Amazon Keyspaces

海外精选
海外精选的内容汇集了全球优质的亚马逊云科技相关技术内容。同时,内容中提到的“AWS” 是 “Amazon Web Services” 的缩写,在此网站不作为商标展示。
0
0
{"value":"[Amazon Web Services (AWS)](http://aws.amazon.com/) customers migrating their Apache Cassandra workloads to [Amazon Keyspaces (for Apache Cassandra)](https://aws.amazon.com/keyspaces/) have rediscovered Cassandra’s [lightweight transactions (LWT) API](https://cassandra.apache.org/doc/trunk/cassandra/architecture/guarantees.html#lightweight-transactions-with-linearizable-consistency). Amazon Keyspaces LWTs have consistent performance, reliable scalability, and improved isolation that allow developers to use LWTs with mission critical workloads. With Amazon Keyspaces, LWTs have similar single digit millisecond latencies as non-LWTs. Additionally, LWTs can be used in combination with non-LWTs without trading off isolation. In this post, we take a close look at Amazon Keyspaces realtime LWTs, their performance characteristics, new levels of isolation, and advanced design patterns.\n\n#### **Apache Cassandra lightweight transactions**\n\nLightweight transactions (LWT) is an Apache Cassandra API feature that allows developers to perform conditional update operations against their table data. Conditional update operations are useful when inserting, updating and deleting records based on conditions that evaluate the current state. Using this feature, developers can implement APIs with delivery semantics such as at-least-once or at-most-once, and design patterns such as optimistic locking.\n\nFigure 1 that follows is an [Amazon CloudWatch metric](https://docs.aws.amazon.com/keyspaces/latest/devguide/monitoring-cloudwatch.html) showing the results of a workload executing both LWT and non-LWT operations against a table. All requests are within a millisecond variance of each other. In the case where the condition check is false, LWT is actually faster because it avoids the modify operation.\n\n![image.png](https://dev-media.amazoncloud.cn/aacdaf4a058c45babeb1909c89f61a42_image.png)\n \nFigure 1: CloudWatch metric showing LWT and non-LWT operation latencies. Dimensions labeled ```LWT``` represent LWT operations and dimensions labeled ```keyvalue``` are non-LWT\n\n#### **Set up tables for examples**\n\nTo run the examples in this post, you must create a keyspace and a table in Amazon Keyspaces. For this post you can use the [Amazon Keyspaces CQL console](https://us-east-1.console.aws.amazon.com/keyspaces/home#cql-editor) or the [cqlsh-expansion](https://pypi.org/project/cqlsh-expansion/) library, which extends the existing cqlsh library with additional helpers and best practices for Keyspaces. Use the following command to connect to Keyspaces using the ```cqlsh-expansion``` library using the ```SigV4AuthProvider``` for short term credentials. This requires setting up the [AWS SDK credentials file or environment variables](https://docs.aws.amazon.com/sdkref/latest/guide/creds-config-files.html) with [AWS Identity and Access Management (IAM)](http://aws.amazon.com/iam) assess key, key ID, and AWS Region.\n\n```\n# Python 2\n\nexport AWS_ACCESS_KEY_ID=<YOUR ACCESS KEY ID>\nexport AWS_SECRET_ACCESS_KEY=<YOUR SECRET ACCESS KEY>\nexport AWS_DEFAULT_REGION=<YOUR REGION>\n\npip install --user cqlsh-expansion\n\ncqlsh-expansion.init\n\ncqlsh-expansion cassandra.us-east-1.amazonaws.com 9142 --ssl --auth-provider \"SigV4AuthProvider\"\n```\n\nFigure 2 that follows shows a user connecting to Amazon Keyspaces with the ```cqlsh-expansion``` library and ```SigV4AuthProvider``` that will use AWS credentials.\n\n![image.png](https://dev-media.amazoncloud.cn/6a078e5243654814bbc5fae9c7df04ed_image.png)\n\nFigure 2: A screenshot of a terminal using the cqlsh-expansion python library to connect to Amazon Keyspaces\n\nStart by creating a new keyspace for your model and name it ```aws_blog```. A new keyspace requires you to use a [SingleRegionStrategy](https://docs.aws.amazon.com/keyspaces/latest/devguide/cql.ddl.html#cql.ddl.keyspace.create) replication strategy. Amazon Keyspaces replicates data three times across multiple availability zones. Additionally, with Amazon Keyspaces, a keyspace can be assigned AWS [resource tags](https://docs.aws.amazon.com/keyspaces/latest/devguide/tagging-keyspaces.html). Efficient tagging provides categorization that enables advanced insights to costs or simplifies managing IAM policies.\n\n```\nCREATE KEYSPACE aws_blog\nWITH REPLICATION = {'class': 'SingleRegionStrategy'}\n AND TAGS = {'blog' : 'Lightweight Transactions'};\n```\n\nAfter the ```aws_blog``` keyspace is ACTIVE, create a new table called ```account_user_profile```. The table is based on a common one-to-many relationship between account and profiles. In this model ```account_id``` is the partition key. Each account_id contains multiple rows sorted by a clustering key column profile_id. The combination of partition key and clustering key form the primary key for key value access to a row. The table also includes various data types such as text, int, timestamps, and static columns, which will be used in the examples in this post.\n\n```\nCREATE TABLE aws_blog.account_user_profile(\n account_id text,\n profile_id text,\n name text,\n playing_time int,\n create_time timestamp,\n modify_time timestamp,\n profile_version text,\n account_version text static,\n account_playing_time int static,\n PRIMARY KEY(account_id, profile_id))\nWITH CUSTOM_PROPERTIES = {\n 'capacity_mode':{\n 'throughput_mode':'PAY_PER_REQUEST'\n }, \n 'point_in_time_recovery':{\n 'status':'enabled'\n }, \n 'encryption_specification':{\n 'encryption_type':'AWS_OWNED_KMS_KEY'\n }\n} AND TAGS = {'blog' : 'Lightweight Transactions'};\n```\n\nUse the following select statement to query the system tables for ```ACTIVE``` status.\n\n```\nSELECT keyspace_name, table_name, status \nFROM system_schema_mcs.tables \nWHERE keyspace_name = 'aws_blog' AND table_name = 'account_user_profile';\n```\n\nFigure 3 that follows shows the output of querying the system table for the status of the table you just created. Once the table has ```ACTIVE``` status, it’s ready for reads and writes.\n\n![image.png](https://dev-media.amazoncloud.cn/372456c31ce1491fb93fb6ff24a292f8_image.png)\n\nFigure 3: Terminal window showing output of a select statement showing table status as ACTIVE\n\n#### **Checking for existence before insert, update, or delete**\n\nNow that you have a keyspace and table created, we use them to learn about using LWTs to insert, update, and delete rows based on existence. In many cases, NoSQL models are designed for [idempotent](https://aws.amazon.com/builders-library/making-retries-safe-with-idempotent-APIs/) APIs, where a modify request can be retried or repeated without changing the end state. There are cases where non-idempotent modifications are required. This is where LWT can really help simplify design. With LWT, you can use the IF EXISTS or IF NOT EXISTS clause to perform an existence check on modification.\n\nFor example, you’re working on a gaming use case to store player profile information in Amazon Keyspaces. There’s a requirement to store the ```creation_time``` and ```modify_time``` for each profile. One approach is to store the ```creation_time``` value only on the initial insert. After the initial insert, you update only the ```modify_time``` of the profile. The challenge is determining if the profile exists at the time of insert. Failure to do so could overwrite an existing value of ```creation_time```. You can use LWTs to insert a row only when ```IF NOT EXISTS``` is true, to ensure that ```creation_time``` is stored at-most once even in the event of retries.\n\nUsing the example cql statement below, insert a new player profile with the current time. On first insert, populate both ```create_time``` and ```modify_time``` with the current time by using the functions [toTimestamp()](https://cassandra.apache.org/doc/latest/cassandra/cql/functions.html#time-conversion-functions) and [now()](https://cassandra.apache.org/doc/latest/cassandra/cql/functions.html#now). For cqlsh, you first need to set consistency to ```LOCAL_QUORUM``` for strong consistency.\n\n```\nCONSISTENCY LOCAL_QUORUM;\n\nINSERT INTO aws_blog.account_user_profile (account_id, profile_id, name, playing_time, create_time, modify_time)\n VALUES ('unique_account_A', 'unique_profile_1', 'Mike', 500, toTimestamp(now()), toTimestamp(now()))\n IF NOT EXISTS;\n```\n\nAfter running the insert you should notice a response indicating the insert was applied. The output, shown in Figure 4 that follows, shows ```applied``` is equal to ```true```.\n\n![image.png](https://dev-media.amazoncloud.cn/c22594070822448e9354dccbcf7bc951_image.png)\n\nFigure 4: Terminal window showing the output of a LWT inserting a new row\n\nRunning this statement again results in a conditional check failure, as shown in Figure 5 that follows. The value for ```applied``` is equal to ```false``` and the state of the current row is returned as a result. Since this statement can only succeed once, it’s useful when developers are trying to prevent overwriting data or creating a ledger of immutable rows.\n![image.png](https://dev-media.amazoncloud.cn/6f549cad440940b0b9d2dea16bb0df3a_image.png)\n\nFigure 5: Terminal window showing the output of a conditional check failure\n\n#### **Update only if a row exists**\n\nAfter you’ve captured the ```create_time``` on initial insert, you need to change the ```modify_time``` only for existing records. For example, you have a new requirement to persist a stream of profile changes. The stream consists mostly of updates to existing profiles, but there is a 5 percent chance that some events will require creating a new profile. You can use LWT to modify rows only if the row already exists. Using ```IF EXISTS```, you can ensure that ```modify_time``` is updated for a row at least once.\n\nIn the following example, you can run a LWT using an IF EXISTS clause to update multiple columns of an existing player profile. The following update statement updates the name to John and the modified_time to the current time.\n\n```\nUPDATE aws_blog.account_user_profile \n SET name = 'John', \n modify_time = toTimestamp(now())\n WHERE account_id='unique_account_A' AND profile_id='unique_profile_1'\n IF EXISTS;\n```\n\n\nFigure 6 that follows shows the output of a successful update statement. Again, like the insert statement in the previous example, the result is true for the applied field. If false, the row doesn’t exist and there is no additional information to return.\n\n![image.png](https://dev-media.amazoncloud.cn/f82f4848b65d4cd2b160020ad2f178e5_image.png)\nFigure 6: Terminal window showing the output of successful update of an existing row using LWT\n\nCassandra’s default behavior is to treat both ```INSERT``` and ```UPDATE``` as ```UPSERT```. With LWT, you can effectively change the default behavior of ```INSERTS``` and ```UPDATES``` to be explicit for the type of modification you desire. With ```IF NOT EXISTS``` and ```IF EXISTS```, ```INSERTS``` will only insert new rows and ```UPDATES``` will only update existing rows.\n\n#### **Implement optimistic locking for a row**\n\nA common pattern in NoSQL is to use optimistic locking when modifications require the latest state. For example, you have a new requirement to track the total playing time for a player profile. You receive increments of playing time to be aggregated and stored as the total sum of playing time. To do this you track the total playing time you need to increment the current value and update the profile with the total. Since there can be multiple clients updating the ```playing_time``` field, you can implement an optimistic locking pattern with LWT to have better guarantees around updating the latest total.\n\nStart by inserting a new profile with a ```profile_version``` of ```v0``` and ```playing_time``` of ```500``` seconds using ```IF NOT EXISTS``` to ensure at-most-once delivery. You can then query the profile to select the current values.\n\n```\nINSERT INTO aws_blog.account_user_profile (account_id, profile_id, name, playing_time, create_time, modify_time, profile_version)\n VALUES ('unique_account_B', 'unique_profile_2', 'Emma', 500, toTimestamp(now()), toTimestamp(now()), 'v0')\n IF NOT EXISTS;\n \n SELECT profile_version, playing_time \n FROM aws_blog.account_user_profile \n WHERE account_id = 'unique_account_B' AND profile_id = 'unique_profile_2';\n```\n\nFigure 7 that follows shows successful insert of a new row and verification of the results. The results of the select statement are used as inputs to the next steps.\n\n![image.png](https://dev-media.amazoncloud.cn/7b2fbece2996426884031c0f619202b7_image.png)\n\nFigure 7: Terminal window showing the successful output of a LWT and the retrieval of the latest values\n\nNow that you have the first version in place, next you update the playing time by 200. You need to add 200 to the current value of 500 and store it back as an aggregated value of 700. By using a conditional check on the ```profile_version``` you can ensure ```playing_time``` wasn’t modified before storing the updated version. You also want to increment the ```profile_version``` to reflect the change to the profile.\n\n```\n UPDATE aws_blog.account_user_profile \n SET playing_time = 700, \n profile_version = 'v1',\n modify_time = toTimestamp(now())\n WHERE account_id = 'unique_account_B' and profile_id = 'unique_profile_2'\n IF profile_version = 'v0' and playing_time != 700;\n```\nVerify the results by running the SELECT statement. You should receive a ```profile_version``` v1 and ```playing_time``` of 700.\n\n```\n SELECT profile_version, playing_time \n FROM aws_blog.account_user_profile \n WHERE account_id = 'unique_account_B' AND profile_id = 'unique_profile_2';\n```\n\nThe output in figure 8 that follows shows a successful update to the profile’s playing time. Verification of results shows the playing time of 700 and the version number has increased to v1.\n\n![image.png](https://dev-media.amazoncloud.cn/8f088a36c4454c999a535ba2c4a01cd4_image.png)\n\nFigure 8: Terminal window showing the successful output of a LWT and retrieval of the latest values\n\nAs an experiment, repeat the command to see the API result in a check failure. The result will contain the current values for ```version_number``` and ```playing_time```. You can now retry the operation with the latest values. The output in Figure 9 that follows shows an example of a conditional check failure response for an update statement.\n\n![image.png](https://dev-media.amazoncloud.cn/e469aab50eb74ac2a50da51201e0f231_image.png)\nFigure 9: Terminal window showing a conditional check failure of a LWT attempting to update a row with a different version number than expected\n\n#### **Implement partition-level optimistic locking using LWT and static columns**\n\nIn the previous example, LWTs were used to create optimistic locking on a row. You can also perform optimistic locking on a logical partition by using LWTs and static columns. When you declare a column in an Amazon Keyspaces table as static, the cell value of the static column is shared among all rows in a logical partition. By using a static column with LWTs, you can implement a mechanism for versioning partition modifications. This pattern allows applications to perform atomic modification on the current state of a partition.\n\nFor example, in the previous example you updated the playing time for an individual profile, but now the requirement is to also aggregate playing time for every profile in a given account. You now have to manage access of multiple writers across multiple rows. By using LWTs and static columns, you can modify the static column cell value and row data atomically while also applying conditions that enable optimistic locking.\n\nBefore continuing, it’s important to review the following:\n\n- With NoSQL you can still model relationships. The data model in this example demonstrates a one-to-many relationship between an account and profiles. The account is represented by the partition key and static columns. Profiles are represented by the clustering key and the row’s cells.\n- In the Cassandra API, inserts and updates are upserts. An update will insert a new row if one is not already present as long as you provide the primary key in the where clause.\n- You can use equality and inequalities =, <, <=, >, >=, and != in your LWT IF statement. You can also check for null and access items in a collection.\n\nFigure 10 that follows is a visual representation of the model of account and profile. Unique accounts are represented by the partition key. Account data is stored in static columns, which can be retrieved by returning any row in the partition. Each unique profile is represented as a row in a partition. Reading any row will also return both profile and account data.\n\n![image.png](https://dev-media.amazoncloud.cn/3656e08ddf3d4b58a861003f575fae73_image.png)\nFigure 10: Visualization of an account user profile table mapped to Cassandra data modeling concepts of partition key, clustering key, rows, and static columns\n\nYou can run a select statement for the ```account_id``` of ```unique_account_C```. This account and partition shouldn’t exist yet, and will result in 0 rows returned. You can reuse this command throughout this example to verify the current partition state.\n\n```\nSELECT * \nFROM aws_blog.account_user_profile \nWHERE account_id = 'unique_account_C';\n```\n#### **Insert a new profile to an account**\n\nFirst, insert a new account and profile. Make sure that neither the partition nor row exists yet. You can do this by performing an LWT with a conditional check on null for ```profile_version``` and ```account_version```. The ```profile_version``` acts as an optimistic lock for the row and ```account_version``` acts as an optimistic lock for the partition. If both are null, then the update will perform an insert. Including a check on ```profile_version``` might seem redundant, but it serves as a useful mechanism for returning the current value in case of a conditional check failure.\n\n```\nUPDATE aws_blog.account_user_profile\n SET playing_time = 700,\n name = 'Emma',\n profile_version = 'pv0',\n account_playing_time = 700,\n account_version = 'av0',\n create_time = toTimestamp(now()),\n modify_time = toTimestamp(now())\n WHERE account_id = 'unique_account_C' AND profile_id = 'unique_profile_1'\n IF profile_version = null AND account_version = null;\n```\n\nFigure 11 that follows shows the results of the initial insert. You can see that both ```account_version``` and ```profile_version``` are ```initialized``` as version ```0``` and ```playing_time``` and ```account_playing_time``` are both equal to ```700```. The fields create_time and ```modify_time``` are omitted in the select statement for readability.\n\n![image.png](https://dev-media.amazoncloud.cn/5de4377f5ee14d10968dc7717b4f9bca_image.png)\nFigure 11: Terminal window showing the output of a cql select statement retrieving all the rows for a given account_id\n\n#### **Add a second profile to an account**\n\nNext, modify the account by adding an additional profile. Similar to the previous step, you can perform this operation by checking the ```account_version``` static column and ```profile_version```. The new profile has a playing time of ```600```, so you need to adjust the ```account_playing_time``` for a total of ```1300```. The following statement makes a conditional check on the ```account_version```, modifies the ```account_playing_time```, and inserts a new row.\n\n```\nUPDATE aws_blog.account_user_profile\n SET playing_time = 600,\n name = 'Brie',\n profile_version = 'pv0',\n account_playing_time = 1300,\n account_version = 'av1',\n create_time = toTimestamp(now()),\n modify_time = toTimestamp(now())\n WHERE account_id = 'unique_account_C' AND profile_id = 'unique_profile_2'\n IF profile_version = null AND account_version = 'av0'; \nSELECT * \nFROM aws_blog.account_user_profile \nWHERE account_id = 'unique_account_C';\n```\n\nFigure 12 that follows shows output that verifies the state after adding an additional profile. You can see that ```account_version``` is incremented and ```profile_version``` is initialized as version ```0```. The ```account_playing_time``` was updated to ```1300``` to represent the aggregate of both profiles’ playing_time. Again, the fields ```create_time``` and ```modify_time``` are omitted in the select statement for readability.\n\n![image.png](https://dev-media.amazoncloud.cn/835d76b7b97948cea82a0652db20fce3_image.png)\nFigure 12: Terminal window showing the output of a cql select statement retrieving all the rows for a given account_id\n\n#### **Modify an existing profile in an account**\n\nFinally, update the account and an existing profile. You update the account and profile only if conditional checks on ```profile_version``` and ```account_version``` succeed. The following update increments the ```playing_time``` and ```account_playing_time``` values by ```500```.\n\n```\nUPDATE aws_blog.account_user_profile\n SET playing_time = 1200,\n profile_version = 'pv1',\n account_playing_time = 1800,\n account_version = 'av2',\n modify_time = toTimestamp(now())\n WHERE account_id = 'unique_account_C' AND profile_id = 'unique_profile_1'\n IF profile_version = 'pv0' AND account_version = 'av1';\n```\n\nFigure 13 that follows shows output verifying the state after modifying an existing profile. You can see that ```account_version``` is incremented again and that profile_version of ```unique_profile_1``` was incremented as well. The ```account_playing_time``` was updated to ```1800``` to represent the new aggregate of both profiles’ ```playing_time```. Again, the fields ```create_time``` and modify_time are omitted in the select statement for readability.\n\n![image.png](https://dev-media.amazoncloud.cn/15ae256636d0470591528aa814e8f2df_image.png)\nFigure 13: Terminal window showing the output of a cql select statement retrieving all the rows for a given account_id\n\n#### **Monitor conditional check failure requests**\n\nWhen a LWT condition equals false, the transaction is rejected and the service emits an Amazon CloudWatch metric reflecting the number of failed conditional checks. Developers and system administrators can monitor this behavior at scale with CloudWatch. In CloudWatch you will find a metric under AWS/Cassandra called ```ConditionalCheckFailedRequests```.\n\n![image.png](https://dev-media.amazoncloud.cn/735b39a5f0564e4e892ee11412b7c219_image.png)\nFigure 14: Amazon CloudWatch metric displaying the number of ConditionalCheckFailedRequests over an hour period\n\n#### **Estimate capacity utilization**\n\nNow that you can monitor LWTs you can better estimate capacity utilization. All writes require ```LOCAL_QUORUM``` consistency and there is no additional charge for using LWTs. The difference from non-LWTs, is that when a LWT condition check results in ```FALSE```, it consumes capacity units. The number of write capacity units consumed depends on the size of the row. If the row size is 2 KB, the failed conditional write consumes two write capacity units. If the row doesn’t currently exist in the table, the operation consumes one write capacity unit. By monitoring ```ConditionalCheckFailedRequests``` you can determine the capacity consumed by LWT condition check failures.\n\n#### **Clean up Amazon Keyspaces resources**\n\nTo finish, you can clean up the resources used in this blog by dropping the keyspace aws_blog. The following command will drop all tables in a given keyspace before deleting the keyspace itself. If you enabled PITR when creating the table, you can restore the ```aws_blog.account_user_profile``` for the next 35 days.\n\n```\nDROP KEYSPACE aws_blog\n```\n\n#### **Conclusion**\n\nIn this post you learned about the improved performance characteristics of Amazon Keyspaces LWT API, advanced design patterns, and operational best practices. LWTs in Keyspaces have single-digit performance and allow you to mix and match LWT and non-LWT operations without losing isolation barriers. You can use LWTs to implement advanced design patterns such as optimistic locking. You can add CloudWatch to monitor LWTs successful operations, latencies, and conditional check events.\n\nFor more information about LWTs and Amazon Keyspaces, check out the [Scaling Data](https://aws.amazon.com/keyspaces/scaling-data/) video resources on the official Amazon Keyspaces product page. In these videos we cover Amazon Keyspaces use cases, serverless architecture and application modernization.\n\n#### **About the Authors**\n\n![image.png](https://dev-media.amazoncloud.cn/2cb5da0c73ef4056a3c095e81cb307a0_image.png)\n\n**Michael Raney** is a Senior Specialist Solutions Architect based in New York and leads the field for Amazon Keyspaces. He works with customers to modernize their legacy database workloads to a serverless architecture. Michael has spent over a decade building distributed systems for high scale and low latency.","render":"<p><a href=\"http://aws.amazon.com/\" target=\"_blank\">Amazon Web Services (AWS)</a> customers migrating their Apache Cassandra workloads to <a href=\"https://aws.amazon.com/keyspaces/\" target=\"_blank\">Amazon Keyspaces (for Apache Cassandra)</a> have rediscovered Cassandra’s <a href=\"https://cassandra.apache.org/doc/trunk/cassandra/architecture/guarantees.html#lightweight-transactions-with-linearizable-consistency\" target=\"_blank\">lightweight transactions (LWT) API</a>. Amazon Keyspaces LWTs have consistent performance, reliable scalability, and improved isolation that allow developers to use LWTs with mission critical workloads. With Amazon Keyspaces, LWTs have similar single digit millisecond latencies as non-LWTs. Additionally, LWTs can be used in combination with non-LWTs without trading off isolation. In this post, we take a close look at Amazon Keyspaces realtime LWTs, their performance characteristics, new levels of isolation, and advanced design patterns.</p>\n<h4><a id=\"Apache_Cassandra_lightweight_transactions_2\"></a><strong>Apache Cassandra lightweight transactions</strong></h4>\n<p>Lightweight transactions (LWT) is an Apache Cassandra API feature that allows developers to perform conditional update operations against their table data. Conditional update operations are useful when inserting, updating and deleting records based on conditions that evaluate the current state. Using this feature, developers can implement APIs with delivery semantics such as at-least-once or at-most-once, and design patterns such as optimistic locking.</p>\n<p>Figure 1 that follows is an <a href=\"https://docs.aws.amazon.com/keyspaces/latest/devguide/monitoring-cloudwatch.html\" target=\"_blank\">Amazon CloudWatch metric</a> showing the results of a workload executing both LWT and non-LWT operations against a table. All requests are within a millisecond variance of each other. In the case where the condition check is false, LWT is actually faster because it avoids the modify operation.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/aacdaf4a058c45babeb1909c89f61a42_image.png\" alt=\"image.png\" /></p>\n<p>Figure 1: CloudWatch metric showing LWT and non-LWT operation latencies. Dimensions labeled <code>LWT</code> represent LWT operations and dimensions labeled <code>keyvalue</code> are non-LWT</p>\n<h4><a id=\"Set_up_tables_for_examples_12\"></a><strong>Set up tables for examples</strong></h4>\n<p>To run the examples in this post, you must create a keyspace and a table in Amazon Keyspaces. For this post you can use the <a href=\"https://us-east-1.console.aws.amazon.com/keyspaces/home#cql-editor\" target=\"_blank\">Amazon Keyspaces CQL console</a> or the <a href=\"https://pypi.org/project/cqlsh-expansion/\" target=\"_blank\">cqlsh-expansion</a> library, which extends the existing cqlsh library with additional helpers and best practices for Keyspaces. Use the following command to connect to Keyspaces using the <code>cqlsh-expansion</code> library using the <code>SigV4AuthProvider</code> for short term credentials. This requires setting up the <a href=\"https://docs.aws.amazon.com/sdkref/latest/guide/creds-config-files.html\" target=\"_blank\">AWS SDK credentials file or environment variables</a> with <a href=\"http://aws.amazon.com/iam\" target=\"_blank\">AWS Identity and Access Management (IAM)</a> assess key, key ID, and AWS Region.</p>\n<pre><code class=\"lang-\"># Python 2\n\nexport AWS_ACCESS_KEY_ID=&lt;YOUR ACCESS KEY ID&gt;\nexport AWS_SECRET_ACCESS_KEY=&lt;YOUR SECRET ACCESS KEY&gt;\nexport AWS_DEFAULT_REGION=&lt;YOUR REGION&gt;\n\npip install --user cqlsh-expansion\n\ncqlsh-expansion.init\n\ncqlsh-expansion cassandra.us-east-1.amazonaws.com 9142 --ssl --auth-provider &quot;SigV4AuthProvider&quot;\n</code></pre>\n<p>Figure 2 that follows shows a user connecting to Amazon Keyspaces with the <code>cqlsh-expansion</code> library and <code>SigV4AuthProvider</code> that will use AWS credentials.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/6a078e5243654814bbc5fae9c7df04ed_image.png\" alt=\"image.png\" /></p>\n<p>Figure 2: A screenshot of a terminal using the cqlsh-expansion python library to connect to Amazon Keyspaces</p>\n<p>Start by creating a new keyspace for your model and name it <code>aws_blog</code>. A new keyspace requires you to use a <a href=\"https://docs.aws.amazon.com/keyspaces/latest/devguide/cql.ddl.html#cql.ddl.keyspace.create\" target=\"_blank\">SingleRegionStrategy</a> replication strategy. Amazon Keyspaces replicates data three times across multiple availability zones. Additionally, with Amazon Keyspaces, a keyspace can be assigned AWS <a href=\"https://docs.aws.amazon.com/keyspaces/latest/devguide/tagging-keyspaces.html\" target=\"_blank\">resource tags</a>. Efficient tagging provides categorization that enables advanced insights to costs or simplifies managing IAM policies.</p>\n<pre><code class=\"lang-\">CREATE KEYSPACE aws_blog\nWITH REPLICATION = {'class': 'SingleRegionStrategy'}\n AND TAGS = {'blog' : 'Lightweight Transactions'};\n</code></pre>\n<p>After the <code>aws_blog</code> keyspace is ACTIVE, create a new table called <code>account_user_profile</code>. The table is based on a common one-to-many relationship between account and profiles. In this model <code>account_id</code> is the partition key. Each account_id contains multiple rows sorted by a clustering key column profile_id. The combination of partition key and clustering key form the primary key for key value access to a row. The table also includes various data types such as text, int, timestamps, and static columns, which will be used in the examples in this post.</p>\n<pre><code class=\"lang-\">CREATE TABLE aws_blog.account_user_profile(\n account_id text,\n profile_id text,\n name text,\n playing_time int,\n create_time timestamp,\n modify_time timestamp,\n profile_version text,\n account_version text static,\n account_playing_time int static,\n PRIMARY KEY(account_id, profile_id))\nWITH CUSTOM_PROPERTIES = {\n 'capacity_mode':{\n 'throughput_mode':'PAY_PER_REQUEST'\n }, \n 'point_in_time_recovery':{\n 'status':'enabled'\n }, \n 'encryption_specification':{\n 'encryption_type':'AWS_OWNED_KMS_KEY'\n }\n} AND TAGS = {'blog' : 'Lightweight Transactions'};\n</code></pre>\n<p>Use the following select statement to query the system tables for <code>ACTIVE</code> status.</p>\n<pre><code class=\"lang-\">SELECT keyspace_name, table_name, status \nFROM system_schema_mcs.tables \nWHERE keyspace_name = 'aws_blog' AND table_name = 'account_user_profile';\n</code></pre>\n<p>Figure 3 that follows shows the output of querying the system table for the status of the table you just created. Once the table has <code>ACTIVE</code> status, it’s ready for reads and writes.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/372456c31ce1491fb93fb6ff24a292f8_image.png\" alt=\"image.png\" /></p>\n<p>Figure 3: Terminal window showing output of a select statement showing table status as ACTIVE</p>\n<h4><a id=\"Checking_for_existence_before_insert_update_or_delete_85\"></a><strong>Checking for existence before insert, update, or delete</strong></h4>\n<p>Now that you have a keyspace and table created, we use them to learn about using LWTs to insert, update, and delete rows based on existence. In many cases, NoSQL models are designed for <a href=\"https://aws.amazon.com/builders-library/making-retries-safe-with-idempotent-APIs/\" target=\"_blank\">idempotent</a> APIs, where a modify request can be retried or repeated without changing the end state. There are cases where non-idempotent modifications are required. This is where LWT can really help simplify design. With LWT, you can use the IF EXISTS or IF NOT EXISTS clause to perform an existence check on modification.</p>\n<p>For example, you’re working on a gaming use case to store player profile information in Amazon Keyspaces. There’s a requirement to store the <code>creation_time</code> and <code>modify_time</code> for each profile. One approach is to store the <code>creation_time</code> value only on the initial insert. After the initial insert, you update only the <code>modify_time</code> of the profile. The challenge is determining if the profile exists at the time of insert. Failure to do so could overwrite an existing value of <code>creation_time</code>. You can use LWTs to insert a row only when <code>IF NOT EXISTS</code> is true, to ensure that <code>creation_time</code> is stored at-most once even in the event of retries.</p>\n<p>Using the example cql statement below, insert a new player profile with the current time. On first insert, populate both <code>create_time</code> and <code>modify_time</code> with the current time by using the functions <a href=\"https://cassandra.apache.org/doc/latest/cassandra/cql/functions.html#time-conversion-functions\" target=\"_blank\">toTimestamp()</a> and <a href=\"https://cassandra.apache.org/doc/latest/cassandra/cql/functions.html#now\" target=\"_blank\">now()</a>. For cqlsh, you first need to set consistency to <code>LOCAL_QUORUM</code> for strong consistency.</p>\n<pre><code class=\"lang-\">CONSISTENCY LOCAL_QUORUM;\n\nINSERT INTO aws_blog.account_user_profile (account_id, profile_id, name, playing_time, create_time, modify_time)\n VALUES ('unique_account_A', 'unique_profile_1', 'Mike', 500, toTimestamp(now()), toTimestamp(now()))\n IF NOT EXISTS;\n</code></pre>\n<p>After running the insert you should notice a response indicating the insert was applied. The output, shown in Figure 4 that follows, shows <code>applied</code> is equal to <code>true</code>.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/c22594070822448e9354dccbcf7bc951_image.png\" alt=\"image.png\" /></p>\n<p>Figure 4: Terminal window showing the output of a LWT inserting a new row</p>\n<p>Running this statement again results in a conditional check failure, as shown in Figure 5 that follows. The value for <code>applied</code> is equal to <code>false</code> and the state of the current row is returned as a result. Since this statement can only succeed once, it’s useful when developers are trying to prevent overwriting data or creating a ledger of immutable rows.<br />\n<img src=\"https://dev-media.amazoncloud.cn/6f549cad440940b0b9d2dea16bb0df3a_image.png\" alt=\"image.png\" /></p>\n<p>Figure 5: Terminal window showing the output of a conditional check failure</p>\n<h4><a id=\"Update_only_if_a_row_exists_112\"></a><strong>Update only if a row exists</strong></h4>\n<p>After you’ve captured the <code>create_time</code> on initial insert, you need to change the <code>modify_time</code> only for existing records. For example, you have a new requirement to persist a stream of profile changes. The stream consists mostly of updates to existing profiles, but there is a 5 percent chance that some events will require creating a new profile. You can use LWT to modify rows only if the row already exists. Using <code>IF EXISTS</code>, you can ensure that <code>modify_time</code> is updated for a row at least once.</p>\n<p>In the following example, you can run a LWT using an IF EXISTS clause to update multiple columns of an existing player profile. The following update statement updates the name to John and the modified_time to the current time.</p>\n<pre><code class=\"lang-\">UPDATE aws_blog.account_user_profile \n SET name = 'John', \n modify_time = toTimestamp(now())\n WHERE account_id='unique_account_A' AND profile_id='unique_profile_1'\n IF EXISTS;\n</code></pre>\n<p>Figure 6 that follows shows the output of a successful update statement. Again, like the insert statement in the previous example, the result is true for the applied field. If false, the row doesn’t exist and there is no additional information to return.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/f82f4848b65d4cd2b160020ad2f178e5_image.png\" alt=\"image.png\" /><br />\nFigure 6: Terminal window showing the output of successful update of an existing row using LWT</p>\n<p>Cassandra’s default behavior is to treat both <code>INSERT</code> and <code>UPDATE</code> as <code>UPSERT</code>. With LWT, you can effectively change the default behavior of <code>INSERTS</code> and <code>UPDATES</code> to be explicit for the type of modification you desire. With <code>IF NOT EXISTS</code> and <code>IF EXISTS</code>, <code>INSERTS</code> will only insert new rows and <code>UPDATES</code> will only update existing rows.</p>\n<h4><a id=\"Implement_optimistic_locking_for_a_row_134\"></a><strong>Implement optimistic locking for a row</strong></h4>\n<p>A common pattern in NoSQL is to use optimistic locking when modifications require the latest state. For example, you have a new requirement to track the total playing time for a player profile. You receive increments of playing time to be aggregated and stored as the total sum of playing time. To do this you track the total playing time you need to increment the current value and update the profile with the total. Since there can be multiple clients updating the <code>playing_time</code> field, you can implement an optimistic locking pattern with LWT to have better guarantees around updating the latest total.</p>\n<p>Start by inserting a new profile with a <code>profile_version</code> of <code>v0</code> and <code>playing_time</code> of <code>500</code> seconds using <code>IF NOT EXISTS</code> to ensure at-most-once delivery. You can then query the profile to select the current values.</p>\n<pre><code class=\"lang-\">INSERT INTO aws_blog.account_user_profile (account_id, profile_id, name, playing_time, create_time, modify_time, profile_version)\n VALUES ('unique_account_B', 'unique_profile_2', 'Emma', 500, toTimestamp(now()), toTimestamp(now()), 'v0')\n IF NOT EXISTS;\n \n SELECT profile_version, playing_time \n FROM aws_blog.account_user_profile \n WHERE account_id = 'unique_account_B' AND profile_id = 'unique_profile_2';\n</code></pre>\n<p>Figure 7 that follows shows successful insert of a new row and verification of the results. The results of the select statement are used as inputs to the next steps.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/7b2fbece2996426884031c0f619202b7_image.png\" alt=\"image.png\" /></p>\n<p>Figure 7: Terminal window showing the successful output of a LWT and the retrieval of the latest values</p>\n<p>Now that you have the first version in place, next you update the playing time by 200. You need to add 200 to the current value of 500 and store it back as an aggregated value of 700. By using a conditional check on the <code>profile_version</code> you can ensure <code>playing_time</code> wasn’t modified before storing the updated version. You also want to increment the <code>profile_version</code> to reflect the change to the profile.</p>\n<pre><code class=\"lang-\"> UPDATE aws_blog.account_user_profile \n SET playing_time = 700, \n profile_version = 'v1',\n modify_time = toTimestamp(now())\n WHERE account_id = 'unique_account_B' and profile_id = 'unique_profile_2'\n IF profile_version = 'v0' and playing_time != 700;\n</code></pre>\n<p>Verify the results by running the SELECT statement. You should receive a <code>profile_version</code> v1 and <code>playing_time</code> of 700.</p>\n<pre><code class=\"lang-\"> SELECT profile_version, playing_time \n FROM aws_blog.account_user_profile \n WHERE account_id = 'unique_account_B' AND profile_id = 'unique_profile_2';\n</code></pre>\n<p>The output in figure 8 that follows shows a successful update to the profile’s playing time. Verification of results shows the playing time of 700 and the version number has increased to v1.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/8f088a36c4454c999a535ba2c4a01cd4_image.png\" alt=\"image.png\" /></p>\n<p>Figure 8: Terminal window showing the successful output of a LWT and retrieval of the latest values</p>\n<p>As an experiment, repeat the command to see the API result in a check failure. The result will contain the current values for <code>version_number</code> and <code>playing_time</code>. You can now retry the operation with the latest values. The output in Figure 9 that follows shows an example of a conditional check failure response for an update statement.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/e469aab50eb74ac2a50da51201e0f231_image.png\" alt=\"image.png\" /><br />\nFigure 9: Terminal window showing a conditional check failure of a LWT attempting to update a row with a different version number than expected</p>\n<h4><a id=\"Implement_partitionlevel_optimistic_locking_using_LWT_and_static_columns_185\"></a><strong>Implement partition-level optimistic locking using LWT and static columns</strong></h4>\n<p>In the previous example, LWTs were used to create optimistic locking on a row. You can also perform optimistic locking on a logical partition by using LWTs and static columns. When you declare a column in an Amazon Keyspaces table as static, the cell value of the static column is shared among all rows in a logical partition. By using a static column with LWTs, you can implement a mechanism for versioning partition modifications. This pattern allows applications to perform atomic modification on the current state of a partition.</p>\n<p>For example, in the previous example you updated the playing time for an individual profile, but now the requirement is to also aggregate playing time for every profile in a given account. You now have to manage access of multiple writers across multiple rows. By using LWTs and static columns, you can modify the static column cell value and row data atomically while also applying conditions that enable optimistic locking.</p>\n<p>Before continuing, it’s important to review the following:</p>\n<ul>\n<li>With NoSQL you can still model relationships. The data model in this example demonstrates a one-to-many relationship between an account and profiles. The account is represented by the partition key and static columns. Profiles are represented by the clustering key and the row’s cells.</li>\n<li>In the Cassandra API, inserts and updates are upserts. An update will insert a new row if one is not already present as long as you provide the primary key in the where clause.</li>\n<li>You can use equality and inequalities =, &lt;, &lt;=, &gt;, &gt;=, and != in your LWT IF statement. You can also check for null and access items in a collection.</li>\n</ul>\n<p>Figure 10 that follows is a visual representation of the model of account and profile. Unique accounts are represented by the partition key. Account data is stored in static columns, which can be retrieved by returning any row in the partition. Each unique profile is represented as a row in a partition. Reading any row will also return both profile and account data.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/3656e08ddf3d4b58a861003f575fae73_image.png\" alt=\"image.png\" /><br />\nFigure 10: Visualization of an account user profile table mapped to Cassandra data modeling concepts of partition key, clustering key, rows, and static columns</p>\n<p>You can run a select statement for the <code>account_id</code> of <code>unique_account_C</code>. This account and partition shouldn’t exist yet, and will result in 0 rows returned. You can reuse this command throughout this example to verify the current partition state.</p>\n<pre><code class=\"lang-\">SELECT * \nFROM aws_blog.account_user_profile \nWHERE account_id = 'unique_account_C';\n</code></pre>\n<h4><a id=\"Insert_a_new_profile_to_an_account_209\"></a><strong>Insert a new profile to an account</strong></h4>\n<p>First, insert a new account and profile. Make sure that neither the partition nor row exists yet. You can do this by performing an LWT with a conditional check on null for <code>profile_version</code> and <code>account_version</code>. The <code>profile_version</code> acts as an optimistic lock for the row and <code>account_version</code> acts as an optimistic lock for the partition. If both are null, then the update will perform an insert. Including a check on <code>profile_version</code> might seem redundant, but it serves as a useful mechanism for returning the current value in case of a conditional check failure.</p>\n<pre><code class=\"lang-\">UPDATE aws_blog.account_user_profile\n SET playing_time = 700,\n name = 'Emma',\n profile_version = 'pv0',\n account_playing_time = 700,\n account_version = 'av0',\n create_time = toTimestamp(now()),\n modify_time = toTimestamp(now())\n WHERE account_id = 'unique_account_C' AND profile_id = 'unique_profile_1'\n IF profile_version = null AND account_version = null;\n</code></pre>\n<p>Figure 11 that follows shows the results of the initial insert. You can see that both <code>account_version</code> and <code>profile_version</code> are <code>initialized</code> as version <code>0</code> and <code>playing_time</code> and <code>account_playing_time</code> are both equal to <code>700</code>. The fields create_time and <code>modify_time</code> are omitted in the select statement for readability.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/5de4377f5ee14d10968dc7717b4f9bca_image.png\" alt=\"image.png\" /><br />\nFigure 11: Terminal window showing the output of a cql select statement retrieving all the rows for a given account_id</p>\n<h4><a id=\"Add_a_second_profile_to_an_account_231\"></a><strong>Add a second profile to an account</strong></h4>\n<p>Next, modify the account by adding an additional profile. Similar to the previous step, you can perform this operation by checking the <code>account_version</code> static column and <code>profile_version</code>. The new profile has a playing time of <code>600</code>, so you need to adjust the <code>account_playing_time</code> for a total of <code>1300</code>. The following statement makes a conditional check on the <code>account_version</code>, modifies the <code>account_playing_time</code>, and inserts a new row.</p>\n<pre><code class=\"lang-\">UPDATE aws_blog.account_user_profile\n SET playing_time = 600,\n name = 'Brie',\n profile_version = 'pv0',\n account_playing_time = 1300,\n account_version = 'av1',\n create_time = toTimestamp(now()),\n modify_time = toTimestamp(now())\n WHERE account_id = 'unique_account_C' AND profile_id = 'unique_profile_2'\n IF profile_version = null AND account_version = 'av0'; \nSELECT * \nFROM aws_blog.account_user_profile \nWHERE account_id = 'unique_account_C';\n</code></pre>\n<p>Figure 12 that follows shows output that verifies the state after adding an additional profile. You can see that <code>account_version</code> is incremented and <code>profile_version</code> is initialized as version <code>0</code>. The <code>account_playing_time</code> was updated to <code>1300</code> to represent the aggregate of both profiles’ playing_time. Again, the fields <code>create_time</code> and <code>modify_time</code> are omitted in the select statement for readability.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/835d76b7b97948cea82a0652db20fce3_image.png\" alt=\"image.png\" /><br />\nFigure 12: Terminal window showing the output of a cql select statement retrieving all the rows for a given account_id</p>\n<h4><a id=\"Modify_an_existing_profile_in_an_account_256\"></a><strong>Modify an existing profile in an account</strong></h4>\n<p>Finally, update the account and an existing profile. You update the account and profile only if conditional checks on <code>profile_version</code> and <code>account_version</code> succeed. The following update increments the <code>playing_time</code> and <code>account_playing_time</code> values by <code>500</code>.</p>\n<pre><code class=\"lang-\">UPDATE aws_blog.account_user_profile\n SET playing_time = 1200,\n profile_version = 'pv1',\n account_playing_time = 1800,\n account_version = 'av2',\n modify_time = toTimestamp(now())\n WHERE account_id = 'unique_account_C' AND profile_id = 'unique_profile_1'\n IF profile_version = 'pv0' AND account_version = 'av1';\n</code></pre>\n<p>Figure 13 that follows shows output verifying the state after modifying an existing profile. You can see that <code>account_version</code> is incremented again and that profile_version of <code>unique_profile_1</code> was incremented as well. The <code>account_playing_time</code> was updated to <code>1800</code> to represent the new aggregate of both profiles’ <code>playing_time</code>. Again, the fields <code>create_time</code> and modify_time are omitted in the select statement for readability.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/15ae256636d0470591528aa814e8f2df_image.png\" alt=\"image.png\" /><br />\nFigure 13: Terminal window showing the output of a cql select statement retrieving all the rows for a given account_id</p>\n<h4><a id=\"Monitor_conditional_check_failure_requests_276\"></a><strong>Monitor conditional check failure requests</strong></h4>\n<p>When a LWT condition equals false, the transaction is rejected and the service emits an Amazon CloudWatch metric reflecting the number of failed conditional checks. Developers and system administrators can monitor this behavior at scale with CloudWatch. In CloudWatch you will find a metric under AWS/Cassandra called <code>ConditionalCheckFailedRequests</code>.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/735b39a5f0564e4e892ee11412b7c219_image.png\" alt=\"image.png\" /><br />\nFigure 14: Amazon CloudWatch metric displaying the number of ConditionalCheckFailedRequests over an hour period</p>\n<h4><a id=\"Estimate_capacity_utilization_283\"></a><strong>Estimate capacity utilization</strong></h4>\n<p>Now that you can monitor LWTs you can better estimate capacity utilization. All writes require <code>LOCAL_QUORUM</code> consistency and there is no additional charge for using LWTs. The difference from non-LWTs, is that when a LWT condition check results in <code>FALSE</code>, it consumes capacity units. The number of write capacity units consumed depends on the size of the row. If the row size is 2 KB, the failed conditional write consumes two write capacity units. If the row doesn’t currently exist in the table, the operation consumes one write capacity unit. By monitoring <code>ConditionalCheckFailedRequests</code> you can determine the capacity consumed by LWT condition check failures.</p>\n<h4><a id=\"Clean_up_Amazon_Keyspaces_resources_287\"></a><strong>Clean up Amazon Keyspaces resources</strong></h4>\n<p>To finish, you can clean up the resources used in this blog by dropping the keyspace aws_blog. The following command will drop all tables in a given keyspace before deleting the keyspace itself. If you enabled PITR when creating the table, you can restore the <code>aws_blog.account_user_profile</code> for the next 35 days.</p>\n<pre><code class=\"lang-\">DROP KEYSPACE aws_blog\n</code></pre>\n<h4><a id=\"Conclusion_295\"></a><strong>Conclusion</strong></h4>\n<p>In this post you learned about the improved performance characteristics of Amazon Keyspaces LWT API, advanced design patterns, and operational best practices. LWTs in Keyspaces have single-digit performance and allow you to mix and match LWT and non-LWT operations without losing isolation barriers. You can use LWTs to implement advanced design patterns such as optimistic locking. You can add CloudWatch to monitor LWTs successful operations, latencies, and conditional check events.</p>\n<p>For more information about LWTs and Amazon Keyspaces, check out the <a href=\"https://aws.amazon.com/keyspaces/scaling-data/\" target=\"_blank\">Scaling Data</a> video resources on the official Amazon Keyspaces product page. In these videos we cover Amazon Keyspaces use cases, serverless architecture and application modernization.</p>\n<h4><a id=\"About_the_Authors_301\"></a><strong>About the Authors</strong></h4>\n<p><img src=\"https://dev-media.amazoncloud.cn/2cb5da0c73ef4056a3c095e81cb307a0_image.png\" alt=\"image.png\" /></p>\n<p><strong>Michael Raney</strong> is a Senior Specialist Solutions Architect based in New York and leads the field for Amazon Keyspaces. He works with customers to modernize their legacy database workloads to a serverless architecture. Michael has spent over a decade building distributed systems for high scale and low latency.</p>\n"}
目录
亚马逊云科技解决方案 基于行业客户应用场景及技术领域的解决方案
联系亚马逊云科技专家
亚马逊云科技解决方案
基于行业客户应用场景及技术领域的解决方案
联系专家
0
目录
关闭
contact-us