{"value":"[Amazon SageMaker](https://aws.amazon.com/cn/sagemaker/?trk=cndc-detail) is a service from Amazon Web Services (Amazon Web Services) that makes it easy to build machine learning models and deploy them in the cloud. If a website is running a SageMaker model, site visitors can upload data and receive the results of running that data through the model.\n\nAll transmissions to and from a SageMaker model are encrypted, but in some cases, customers may be wary of having sensitive data decrypted for analysis. Privacy-preserving machine learning (PPML) is a class of techniques that let machine learning models compute directly on encrypted data, returning encrypted results. Only the person who encrypted the input data can decrypt the result.\n\nAn open-source prototype of the privacy-preserving version of XGBoost can be ++[found](https://github.com/awslabs/privacy-preserving-xgboost-inference)++ at GitHub.\n\nAt the NeurIPS Workshop on Privacy-Preserving Machine Learning in December, we will present a privacy-preserving ++[version](https://www.amazon.science/publications/privacy-preserving-xgboost-inference)++ of a machine learning algorithm called XGBoost, which produces models known as gradient-boosted decision trees. XGBoost is one of the most popular machine learning algorithms offered through [Amazon SageMaker](https://aws.amazon.com/cn/sagemaker/?trk=cndc-detail).\n\n![image.png](https://dev-media.amazoncloud.cn/f85ee0af919640288f5324661fc5b1df_image.png)\n\nA model consisting of two decision trees, one that imposes an age criterion and one that imposes a height criterion. For the input [age = 36, height = 175], the model’s output is 2 + 0.7 = 2.7.\n\nCREDIT: GLYNIS CONDON\n\nPPML is rarely used because it adds computational overhead, which often makes the resulting models too slow to be practical. But we tested our algorithm on a model that’s roughly 500 kilobytes in size and found that the privacy-preserving version takes around 0.4 seconds to produce a result (compared to 1 millisecond for the unencrypted version). That’s consistent with many cloud-based machine learning tasks currently initiated from smartphones.\n\nWe have open-sourced our prototype, and the code is ++[available](https://github.com/awslabs/privacy-preserving-xgboost-inference)++ in Amazon Web Services Labs.\n\nXGBoost (for Extreme Gradient Boosting) is an optimized, distributed, gradient-boosting machine learning framework designed to be highly efficient and flexible. Given a set of training data, XGBoost produces a set of parallel classification and regression trees. Each tree evaluates an input query by making a branching decision at each node of the tree until a leaf node is reached, at which point it outputs a numerical score. The results across all of the trees are summed to give an overall output.\n\nTo design a privacy-preserving XGBoost inference algorithm, we use several cryptographic tools. One is order-preserving encryption (OPE), which allows data to be encrypted in such a way that the ciphertexts — the encrypted versions of the data — preserve the order for the plaintexts — the unencrypted versions. That is, for any plaintexts *a* and *b*, *a* > *b* if and only if the ciphertext of a is greater than that of *b* (Enc(*a*) > Enc(*b*)), and vice versa. \n\n![image.png](https://dev-media.amazoncloud.cn/fe4db1030af04de992ae4334d8fa7f11_image.png)\n\nAn encrypted regression tree.\n\nCREDIT: XIANRUI MENG\n\nAnother tool is *pseudo-random functions (PRFs)*, which allow us to construct functions that are essentially indistinguishable from a random function. PRFs are used to generate pseudorandom “features names” for the values that the tree nodes are testing, and OPEs are used to encrypt the values those features are being tested against. \n\nWe also use *additively homomorphic encryption (AHE)*, which is a semantically secure homomorphic encryption scheme that allows us to evaluate the addition function over ciphertexts. That is, there exists a homomorphic evaluation function that, given two ciphertexts, can compute the encryption of the sum of the corresponding plaintexts (Eval(Enc(*a*), Enc(*b*), +) = Enc(*a*+*b*)). AHE allows us to combine the outputs of the regression trees to obtain the final encrypted result.\n\nDuring operation, the site visitor’s computer encrypts a plaintext query with OPE. It sends an encrypted query to the server hosting the PPML model. The server evaluates each tree on the encrypted query and obtains a set of encrypted leaf values. Then the server homomorphically sums all encrypted leaf values and returns the result to the visitor’s computer, which can decrypt it to obtain the final prediction. In our paper we show that the server can homomorphically compute the *softmax* function, commonly used in XGBoost, as well as the sum.\n\nFuture work includes support for more learning parameters in the privacy-preserving version and the use of secure multiparty computation that executes secure comparisons for each decision node in an encrypted regression tree.\n\nABOUT THE AUTHOR\n\n\n#### **[Xianrui Meng](https://www.amazon.science/author/xianrui-meng)**\n\n\nXianrui Meng is an applied scientist with Amazon Web Services' Cryptography group.","render":"<p>Amazon SageMaker is a service from Amazon Web Services (Amazon Web Services) that makes it easy to build machine learning models and deploy them in the cloud. If a website is running a SageMaker model, site visitors can upload data and receive the results of running that data through the model.</p>\n<p>All transmissions to and from a SageMaker model are encrypted, but in some cases, customers may be wary of having sensitive data decrypted for analysis. Privacy-preserving machine learning (PPML) is a class of techniques that let machine learning models compute directly on encrypted data, returning encrypted results. Only the person who encrypted the input data can decrypt the result.</p>\n<p>An open-source prototype of the privacy-preserving version of XGBoost can be <ins><a href=\\"https://github.com/awslabs/privacy-preserving-xgboost-inference\\" target=\\"_blank\\">found</a></ins> at GitHub.</p>\n<p>At the NeurIPS Workshop on Privacy-Preserving Machine Learning in December, we will present a privacy-preserving <ins><a href=\\"https://www.amazon.science/publications/privacy-preserving-xgboost-inference\\" target=\\"_blank\\">version</a></ins> of a machine learning algorithm called XGBoost, which produces models known as gradient-boosted decision trees. XGBoost is one of the most popular machine learning algorithms offered through Amazon SageMaker.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/f85ee0af919640288f5324661fc5b1df_image.png\\" alt=\\"image.png\\" /></p>\n<p>A model consisting of two decision trees, one that imposes an age criterion and one that imposes a height criterion. For the input [age = 36, height = 175], the model’s output is 2 + 0.7 = 2.7.</p>\n<p>CREDIT: GLYNIS CONDON</p>\n<p>PPML is rarely used because it adds computational overhead, which often makes the resulting models too slow to be practical. But we tested our algorithm on a model that’s roughly 500 kilobytes in size and found that the privacy-preserving version takes around 0.4 seconds to produce a result (compared to 1 millisecond for the unencrypted version). That’s consistent with many cloud-based machine learning tasks currently initiated from smartphones.</p>\n<p>We have open-sourced our prototype, and the code is <ins><a href=\\"https://github.com/awslabs/privacy-preserving-xgboost-inference\\" target=\\"_blank\\">available</a></ins> in Amazon Web Services Labs.</p>\n<p>XGBoost (for Extreme Gradient Boosting) is an optimized, distributed, gradient-boosting machine learning framework designed to be highly efficient and flexible. Given a set of training data, XGBoost produces a set of parallel classification and regression trees. Each tree evaluates an input query by making a branching decision at each node of the tree until a leaf node is reached, at which point it outputs a numerical score. The results across all of the trees are summed to give an overall output.</p>\n<p>To design a privacy-preserving XGBoost inference algorithm, we use several cryptographic tools. One is order-preserving encryption (OPE), which allows data to be encrypted in such a way that the ciphertexts — the encrypted versions of the data — preserve the order for the plaintexts — the unencrypted versions. That is, for any plaintexts <em>a</em> and <em>b</em>, <em>a</em> > <em>b</em> if and only if the ciphertext of a is greater than that of <em>b</em> (Enc(<em>a</em>) > Enc(<em>b</em>)), and vice versa.</p>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/fe4db1030af04de992ae4334d8fa7f11_image.png\\" alt=\\"image.png\\" /></p>\n<p>An encrypted regression tree.</p>\n<p>CREDIT: XIANRUI MENG</p>\n<p>Another tool is <em>pseudo-random functions (PRFs)</em>, which allow us to construct functions that are essentially indistinguishable from a random function. PRFs are used to generate pseudorandom “features names” for the values that the tree nodes are testing, and OPEs are used to encrypt the values those features are being tested against.</p>\\n<p>We also use <em>additively homomorphic encryption (AHE)</em>, which is a semantically secure homomorphic encryption scheme that allows us to evaluate the addition function over ciphertexts. That is, there exists a homomorphic evaluation function that, given two ciphertexts, can compute the encryption of the sum of the corresponding plaintexts (Eval(Enc(<em>a</em>), Enc(<em>b</em>), +) = Enc(<em>a</em>+<em>b</em>)). AHE allows us to combine the outputs of the regression trees to obtain the final encrypted result.</p>\\n<p>During operation, the site visitor’s computer encrypts a plaintext query with OPE. It sends an encrypted query to the server hosting the PPML model. The server evaluates each tree on the encrypted query and obtains a set of encrypted leaf values. Then the server homomorphically sums all encrypted leaf values and returns the result to the visitor’s computer, which can decrypt it to obtain the final prediction. In our paper we show that the server can homomorphically compute the <em>softmax</em> function, commonly used in XGBoost, as well as the sum.</p>\\n<p>Future work includes support for more learning parameters in the privacy-preserving version and the use of secure multiparty computation that executes secure comparisons for each decision node in an encrypted regression tree.</p>\n<p>ABOUT THE AUTHOR</p>\n<h4><a id=\\"Xianrui_Menghttpswwwamazonscienceauthorxianruimeng_39\\"></a><strong><a href=\\"https://www.amazon.science/author/xianrui-meng\\" target=\\"_blank\\">Xianrui Meng</a></strong></h4>\n<p>Xianrui Meng is an applied scientist with Amazon Web Services’ Cryptography group.</p>\n"}