Analyze and tag assets stored in Veeva Vault PromoMats using Amazon AppFlow and Amazon AI Services

海外精选

海外精选的内容汇集了全球优质的亚马逊云科技相关技术内容。同时，内容中提到的“AWS” 是 “Amazon Web Services” 的缩写，在此网站不作为商标展示。

{"value":"In a previous [post](https://aws.amazon.com/de/blogs/machine-learning/analyzing-and-tagging-assets-stored-in-veeva-vault-promomats-using-amazon-ai-services/), we talked about analyzing and tagging assets stored in Veeva Vault PromoMats using Amazon AI services and the Veeva Vault Platform’s APIs. In this post, we explore how to use [Amazon AppFlow](https://aws.amazon.com/appflow/), a fully managed integration service that enables you to securely transfer data from software as a service (SaaS) applications like Veeva Vault to AWS. [The Amazon AppFlow Veeva](https://docs.aws.amazon.com/appflow/latest/userguide/veeva.html) connector allows you to connect your AWS environment to the Veeva ecosystem quickly, reliably, and cost-effectively in order to analyze the rich content stored in Veeva Vault at scale.\n\nThe Amazon AppFlow Veeva connector is the first Amazon AppFlow connector supporting automatic transfer of [Veeva documents](https://aws.amazon.com/about-aws/whats-new/2021/06/amazon-appflow-now-supports-documents-with-veeva/). It allows you to choose between the latest version (the Steady State version in Veeva terms) and all versions of documents. Moreover, you can import document metadata.\n\nWith a few clicks, you can easily set up a managed connection and choose the Veeva Vault documents and metadata to import. You can further adjust the import behavior by mapping source fields to destination fields. You can also add filters based on document type and subtype, classification, products, country, site, and more. Lastly, you can add validation and manage on-demand and scheduled flow triggers.\n\nYou can use the Amazon AppFlow Veeva connector for various use cases, ranging from Veeva Vault PromoMats to other Veeva Vault solutions such as QualityDocs, eTMF, or Regulatory Information Management (RIM). The following are some of the use cases where you can use the connector:\n\n- **Data synchronization** – You can use the connector in the process of establishing consistency and harmonization between data from a source Veeva Vault and any downstream systems over time. For example, you can share Veeva PromoMats marketing assets to Salesforce. You could also use the connector to share Veeva QualityDocs like Standard Operating Procedures (SOPs) or specifications to cached websites that are searchable from tablets present on the manufacturing floor.\n- **Anomaly detection** – You can share Veeva PromoMats documents to [Amazon Lookout for Metrics](https://aws.amazon.com/lookout-for-metrics/) for anomaly detection. You can also use the connector with Vault RIM in artwork, commercial labels, templates, or patient leaflets before importing them for printing into enterprise labeling solutions such as Loftware.\n- **Data lake hydration** – The connector can be an effective tool for replicating structured or unstructured data into data lakes, in order to support the creation and hydration of data lakes. For example, you can use the connector to extract standardized study information from protocols stored in Vault RIM and expose it downstream to medical analytics insight teams.\n- **Translations** – The connector can be useful in sending artwork, clinical documents, marketing materials, or study protocols for translation in native languages to departments such as packaging, clinical trials, or regulatory submissions.\n\nThis post focuses on how you can use [Amazon AI services](https://aws.amazon.com/machine-learning/ai-services/) in combination with Amazon AppFlow to analyze content stored in Veeva Vault PromoMats, automatically extract tag information, and ultimately feed this information back into the Veeva Vault system. The post discusses the overall architecture, the steps to deploy a solution and dashboard, and a use case of asset metadata tagging. For more information about the proof of concept code base for this use case, see the [GitHub repository](https://github.com/aws-samples/aws-ai-appflow-veeva-integration).\n\n### **Solution overview**\n\nThe following diagram illustrates the updated solution architecture.\n\n![image.png](https://dev-media.amazoncloud.cn/cd4eb27dc3cd41a1880886d280f6c8c8_image.png)\n\n\nPreviously, in order to import assets from Veeva Vault, you had to write your own custom code logic using the [Veeva Vault APIs](https://developer.veevavault.com/api/21.3/) to poll for changes and import the data into [Amazon Simple Storage Service](http://aws.amazon.com/s3) (Amazon S3). This could be a manual, time-consuming process, in which you had to account for API limitations, failures, and retries, as well as scalability to accommodate an unlimited amount of assets. The updated solution uses Amazon AppFlow to abstract away the complexity of maintaining a custom Veeva to Amazon S3 data import pipeline.\n\nAs mentioned in the introduction, Amazon AppFlow is an easy-to-use, no-code self-service tool that uses point-and-click configurations to move data easily and securely between various SaaS applications and AWS services. AppFlow allows you to pull data (objects and documents) from supported sources and write that data to various supported destinations. The source or destination could be a SaaS application or an AWS service such as Amazon S3, [Amazon Redshift](http://aws.amazon.com/redshift), or Lookout for Metrics. In addition to the no-code interface, Amazon AppFlow supports configuration via API, AWS CLI, and [AWS CloudFormation](http://aws.amazon.com/cloudformation) interfaces.\n\nA flow in Amazon AppFlow describes how data is to be moved, including source details, destination details, flow trigger conditions (on demand, on event, or scheduled), and data processing tasks such as checkpointing, field validation, or masking. When triggered, Amazon AppFlow runs a flow that fetches the source data (generally through the source application’s public APIs), runs data processing tasks, and transfers processed data to the destination.\n\nIn this example, you deploy a preconfigured flow using a CloudFormation template. The following screenshot shows the preconfigured ```veeva-aws-connector``` flow that is created automatically by the solution template on the Amazon AppFlow console.\n\n![image.png](https://dev-media.amazoncloud.cn/78663a23d2864a2787539c6559944440_image.png)\n\nThe flow uses Veeva as a source and is configured to import Veeva Vault component objects. Both the metadata and source files are necessary in order to keep track of the assets that have been processed and push tags back on the correct corresponding asset in the source system. In this situation, only the latest version is being imported, and renditions aren’t included.\n\n![image.png](https://dev-media.amazoncloud.cn/a23da3d5fa574aa9a4f70cf27e86e77e_image.png)\n\nThe flow’s destination also needs to be configured. In the following screenshot, we define a file format and folder structure for the S3 bucket that was created as part of the CloudFormation template.\n\n![image.png](https://dev-media.amazoncloud.cn/d3a5c44ebf9f4b85b99efa13e6c970f6_image.png)\n\nFinally, the flow is triggered on demand for demonstration purposes. This can be modified so that the flow runs on a schedule, with a maximum granularity of 1 minute. When triggered on a schedule, the transfer mode changes automatically from a full transfer to an incremental transfer mode. You specify a source timestamp field for tracking the changes. For the tagging use case, we have found that the **Last Modified Date** setting is the most suitable.\n\n![image.png](https://dev-media.amazoncloud.cn/ae831bba53e1453c9a4c727a0ae69ea4_image.png)\n\nAmazon AppFlow is then integrated with [Amazon EventBridge](https://aws.amazon.com/eventbridge/) to publish events whenever a flow run is complete.\n\nFor better resiliency, the ```AVAIAppFlowListener``` [AWS Lambda](http://aws.amazon.com/lambda) function is wired into EventBridge. When an Amazon AppFlow event is triggered, it verifies that the specific flow run has completed successfully, reads the metadata information of all imported assets from that specific flow run, and pushes individual document metadata into an [Amazon Simple Queue Service](https://aws.amazon.com/sqs/) (Amazon SQS) queue. Using Amazon SQS provides a loose coupling between the producer and processor sections of the architecture and also allows you to deploy changes to the processor section without stopping the incoming updates.\n\nA second poller function (```AVAIQueuePoller```) reads the SQS queue at frequent intervals (every minute) and processes the incoming assets. For an even better reaction time from the Lambda function, you can replace the CloudWatch rule by configuring Amazon SQS as a trigger for the function.\n\nDepending on the incoming message type, the solution uses various AWS AI services to derive insights from your data. Some examples include:\n\n- **Text files** – The function uses the [DetectEntities](https://docs.aws.amazon.com/comprehend/latest/dg/extracted-med-info.html) operation of [Amazon Comprehend Medical](https://aws.amazon.com/comprehend/medical/), a natural language processing (NLP) service that makes it easy to use ML to extract relevant medical information from unstructured text. This operation detects entities in categories like ```Anatomy```, ```Medical_Condition```, ```Medication```, ```Protected_Health_Information```, and ```Test_Treatment_Procedure```. The resulting output is filtered for ```Protected_Health_Information```, and the remaining information, along with confidence scores, is flattened and inserted into an [Amazon DynamoDB](https://aws.amazon.com/dynamodb/) table. This information is plotted on the OpenSearch Kibana cluster. In real-world applications, you can also use the Amazon Comprehend Medical [ICD-10-CM or RxNorm](https://aws.amazon.com/about-aws/whats-new/2019/12/announcing-icd-10-cm-rxnorm-ontology-linking-amazon-comprehend-medical/) feature to link the detected information to medical ontologies so downstream healthcare applications can use it for further analysis.\n\n- **Images **– The function uses the [DetectLabels](https://docs.aws.amazon.com/rekognition/latest/dg/labels-detect-labels-image.html) method of [Amazon Rekognition](https://aws.amazon.com/rekognition/) to detect labels in the incoming image. These labels can act as tags to identify the rich information buried in your images, such as information about commercial artwork and clinical labels. If labels like ```Human``` or ```Person``` are detected with a confidence score of more than 80%, the code uses the [DetectFaces](https://docs.aws.amazon.com/rekognition/latest/dg/faces-detect-images.html) method to look for key facial features such as eyes, nose, and mouth to detect faces in the input image. Amazon Rekognition delivers all this information with an associated confidence score, which is flattened and stored in the DynamoDB table.\n\n- **Voice recordings** – For audio assets, the code uses the [StartTranscriptionJob](https://docs.aws.amazon.com/transcribe/latest/dg/API_StartTranscriptionJob.html) asynchronous method of [Amazon Transcribe](https://aws.amazon.com/transcribe/) to transcribe the incoming audio to text, passing in a unique identifier as the ```TranscriptionJobName```. The code assumes the audio language to be English (US), but you can modify it to tie to the information coming from Veeva Vault. The code calls the [GetTranscriptionJob](https://docs.aws.amazon.com/transcribe/latest/dg/API_GetTranscriptionJob.html) method, passing in the same unique identifier as the ```TranscriptionJobName``` in a loop, until the job is complete. Amazon Transcribe delivers the output file on an S3 bucket, which is read by the code and deleted. The code calls the text processing workflow (as discussed earlier) to extract entities from transcribed audio.\n\n- **Scanned documents (PDFs)** – A large percentage of life sciences assets are represented in PDFs—these could be anything from scientific journals and research papers to drug labels. [Amazon Textract](https://aws.amazon.com/textract/) is a service that automatically extracts text and data from scanned documents. The code uses the [StartDocumentTextDetection](https://docs.aws.amazon.com/textract/latest/dg/async-analyzing-with-sqs.html) method to start an asynchronous job to detect text in the document. The code uses the ```JobId``` returned in the response to call [GetDocumentTextDetection](https://docs.aws.amazon.com/textract/latest/dg/API_GetDocumentTextDetection.html) in a loop, until the job is complete. The output JSON structure contains lines and words of detected text, along with confidence scores for each element it identifies, so you can make informed decisions about how to use the results. The code processes the JSON structure to recreate the text blurb and calls the text processing workflow to extract entities from the text.\n\nA DynamoDB table stores all the processed data. The solution uses [DynamoDB Streams and Lambda triggers](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.Lambda.html) (```AVAIPopulateES```) to populate data into an OpenSearch Kibana cluster. The AVAIPopulateES function runs for every update, insert, and delete operation that happens in the DynamoDB table, and inserts one corresponding record in the OpenSearch index. You can visualize these records using Kibana.\n\nTo close the feedback loop, the ```AVAICustomFieldPopulator``` Lambda function has been created. It’s triggered by events in the DynamoDB stream of the metadata DynamoDB table. For every ```DocumentID``` in the DynamoDB records, the function tries to upsert tag information into a predefined custom field property of the asset with the corresponding ID in Veeva, using the Veeva API. To avoid inserting noise into the custom field, the Lambda function filters any tags that have been identified with a confidence score lower than 0.9. Failed requests are forwarded to a dead-letter queue (DLQ) for manual inspection or automatic retry.\n\nThis solution offers a serverless, pay-as-you-go approach to process, tag, and enable comprehensive searches on your digital assets. Additionally, each managed component has high availability built in by automatic deployment across multiple Availability Zones. For [Amazon OpenSearch Service](https://aws.amazon.com/opensearch-service/) (successor to Amazon Elasticsearch Service), you can choose the [three-AZ option](https://aws.amazon.com/blogs/database/increase-availability-for-amazon-elasticsearch-service-by-deploying-in-three-availability-zones-2/) to provide better availability for your domains.\n\n### **Prerequisites**\n\nFor this walkthrough, you should have the following prerequisites:\n\n- An [AWS account](https://signin.aws.amazon.com/signin?redirect_uri=https%3A%2F%2Fportal.aws.amazon.com%2Fbilling%2Fsignup%2Fresume&client_id=signup) with appropriate [AWS Identity and Access Management](http://aws.amazon.com/iam) (IAM) permissions to launch the CloudFormation template\n- Appropriate access credentials for a Veeva Vault PromoMats domain (domain URL, user name, and password)\n- A custom content tag defined in Veeva for the digital assets that you want to be tagged (as an example, we created the ```AutoTags``` custom content tag)\n- Digital assets in the PromoMats Vault accessible to the preceding credentials\n\n### **Deploy your solution**\n\nYou use a CloudFormation stack to deploy the solution. The stack creates all the necessary resources, including:\n\n- An S3 bucket to store the incoming assets.\n- An Amazon AppFlow flow to automatically import assets into the S3 bucket.\n- An EventBridge rule and Lambda function to react to the events generated by Amazon AppFlow (```AVAIAppFlowListener```).\n- An SQS FIFO queue to act as a loose coupling between the listener function (```AVAIAppFlowListener```) and the poller function (```AVAIQueuePoller```).\n- A DynamoDB table to store the output of Amazon AI services.\n- An Amazon OpenSearch Kibana (ELK) cluster to visualize the analyzed tags.\n- A Lambda function to push back identified tags into Veeva (```AVAICustomFieldPopulator```), with a corresponding DLQ.\n- Required Lambda functions:\n- **AVAIAppFlowListener** – Triggered by events pushed by Amazon AppFlow to EventBridge. Used for flow run validation and pushing a message to the SQS queue.\n- **AVAIQueuePoller** – Triggered every 1 minute. Used for polling the SQS queue, processing the assets using Amazon AI services, and populating the DynamoDB table.\n- **AVAIPopulateES** – Triggered when there is an update, insert, or delete on the DynamoDB table. Used for capturing changes from DynamoDB and populating the ELK cluster.\n- **AVAICustomFieldPopulator** – Triggered when there is an update, insert, or delete on the DynamoDB table. Used for feeding back tag information into Veeva.\n\n- The [Amazon CloudWatch Events](http://aws.amazon.com/cloudwatch) rules that trigger the ```AVAIQueuePoller``` function. These triggers are in the ```DISABLED``` state by default.\n- Required IAM roles and policies for interacting with EventBridge and the AI services in a scoped-down manner.\n\nTo get started, complete the following steps:\n\n1. Sign in to the [AWS Management Console](http://aws.amazon.com/console) with an account that has the prerequisite IAM permissions.\n2. Choose [Launch Stack]() and open it on a new tab:\n![image.png](https://dev-media.amazoncloud.cn/c4147b6ac22d4b3f89a2eca3275fdc36_image.png)\n3. On the **Create stack** page, choose **Next**.\n\n![image.png](https://dev-media.amazoncloud.cn/459c6f380d5b4ae5ba07c864d1075cf3_image.png)\n\n4. On the **Specify stack details** page, enter a name for the stack.\n5. Enter values for the parameters.\n6. Choose **Next**.\n\n![image.png](https://dev-media.amazoncloud.cn/afe97497cafe473783e8d5b92edb1a17_image.png)\n\n7. On the **Configure stack options** page, leave everything as the default and choose **Next**.\n\n![image.png](https://dev-media.amazoncloud.cn/b6f65a3980ce48869a0c7e910c09a036_image.png)\n\n8. On the **Review **page, in the **Capabilities and transforms** section, select the three check boxes.\n9. Choose **Create stack**.\n\n![image.png](https://dev-media.amazoncloud.cn/c3cff2f77dc643009b22a01346ad8786_image.png)\n\n10. Wait for the stack to complete. You can examine various events from the stack creation process on the **Events **tab.\n11. After the stack creation is complete, you can look on the **Resources **tab to see all the resources the CloudFormation template created.\n12. On the **Outputs** tab, copy the value of ```ESDomainAccessPrincipal```.\n\nThis is the ARN of the IAM role that the ```AVAIPopulateES``` function assumes. You use it later to configure access to the Amazon OpenSearch Service domain.\n\n### **Set up Amazon OpenSearch Service and Kibana**\n\nThis section walks you through securing your Amazon OpenSearch Service cluster and installing a local proxy to access Kibana securely.\n\n1. On the Amazon OpenSearch Service console, select the domain that was created by the template.\n2. On the **Actions** menu, choose **Modify access policy**.\n\n![image.png](https://dev-media.amazoncloud.cn/88a8a3c804ef41a0b1bb7b068e6d880c_image.png)\n\n3. For **Domain access policy**, choose **Custom access policy**.\n\n![image.png](https://dev-media.amazoncloud.cn/de820077e86b4ebbb637a09623897af4_image.png)\n\n4.In the **Access policy will be cleared** pop-up window, choose **Clear and continue**.\n\n![image.png](https://dev-media.amazoncloud.cn/7e7403344b7249ac8da16b5a7240821e_image.png)\n\n5. On the next page, configure the following statements to lock down access to the Amazon OpenSearch Service domain:\n\na. **Allow IPv4 address** – Your IP address.\nb. **Allow IAM ARN** – The value of ```ESDomainAccessPrincipal``` you copied earlier.\n\n![image.png](https://dev-media.amazoncloud.cn/bd7401d747f04789bed4ecf3ff3b6833_image.png)\n\n6. Choose **Submit**.\n\nThis creates an access policy that grants access to the AVAIPopulateES function and Kibana access from your IP address. For more information about scoping down your access policy, see C[onfiguring access policies](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/createupdatedomains.html#createdomain-configure-access-policies).\n\n7. Wait for the domain status to show as ```Active```.\n8. On the Amazon EventBridge console, under **Events**, choose **Rules**. You can see two rules that the CloudFormation template created.\n9. Select the ```AVAIQueuePollerSchedule``` rule and enable it by clicking Enable.\n\n![image.png](https://dev-media.amazoncloud.cn/b881c244095549e09f74c0f93934eeec_image.png)\n\nIn 5–8 minutes, the data should start flowing in and entities are created in the Amazon OpenSearch Service cluster. You can now visualize these entities in Kibana. To do this, you use an open-source proxy called [aws-es-kibana](https://github.com/santthosh/aws-es-kibana). To install the proxy on your computer, enter the following code:\n\n```\naws-es-kibana your_OpenSearch_domain_endpoint\n```\n\nYou can find the domain endpoint on the **Outputs **tab of the CloudFormation stack under ```ESDomainEndPoint```. You should see the following output:\n\n![image.png](https://dev-media.amazoncloud.cn/881ce068b2cc4cc8affc8341995d9542_image.png)\n\n### **Create visualizations and analyze tagged content**\n\nPlease refer to the original [blogpost](https://aws.amazon.com/de/blogs/machine-learning/analyzing-and-tagging-assets-stored-in-veeva-vault-promomats-using-amazon-ai-services/).\n\n#### **Clean up**\n\nTo avoid incurring future charges, delete the resources when not in use. You can easily delete all resources by deleting the associated CloudFormation stack. Note that you need to empty the created S3 buckets of content in order for the deletion of the stack to be successful.\n\n### **Conclusion**\n\nIn this post, we demonstrated how you can use Amazon AI services in combination with Amazon AppFlow to extend the functionality of Veeva Vault PromoMats and extract valuable information quickly and easily. The built-in loop back mechanism allows you to update the tags back into Veeva Vault and enable auto-tagging of your assets. This makes it easier for your team to find and locate assets quickly.\n\nAlthough no ML output is perfect, it can come very close to human performance and help offset a substantial portion of your team’s efforts. You can use this additional capacity towards value-added tasks, while dedicating a small capacity to check the output of the ML solution. This solution can also help optimize costs, achieve tagging consistency, and enable quick discovery of existing assets.\n\nFinally, you can maintain ownership of your data and choose which AWS services can process, store, and host the content. AWS doesn’t access or use your content for any purpose without your consent, and never uses customer data to derive information for marketing or advertising. For more information, see [Data Privacy FAQ](https://aws.amazon.com/compliance/data-privacy-faq/).\n\nYou can also extend the functionality of this solution further with additional enhancements. For example, in addition to the AI and ML services in this post, you can easily add any of your custom ML models built using [Amazon SageMaker](https://aws.amazon.com/sagemaker/) to the architecture.\n\nIf you’re interested in exploring additional use cases for Veeva and AWS, please reach out to your AWS account team.\n\nVeeva Systems has reviewed and approved this content. For additional Veeva Vault-related questions, please contact [Veeva support](https://support.veeva.com/hc/en-us).\n\n#### **About the authors**\n\n![image.png](https://dev-media.amazoncloud.cn/d7bca1edce2f4960bb7dae642c5c4adc_image.png)\n\n[Mayank Thakkar](https://www.linkedin.com/in/thakkarmayank) is Head of AI/ML Business Development, Global Healthcare and Life Sciences at AWS. He has more than 18 years of experience in varied industries like healthcare, life sciences, insurance, and retail, specializing in building serverless, artificial intelligence, and machine learning-based solutions to solve real-world industry problems. At AWS, he works closely with big pharma companies around the world to build cutting-edge solutions and help them along their cloud journey. Apart from work, Mayank, along with his wife, is busy raising two energetic and mischievous boys, Aaryan (6) and Kiaan (4), while trying to keep the house from burning down or getting flooded!\n\n![image.png](https://dev-media.amazoncloud.cn/107f2cb7f08f4d84b2fa6dda18443c09_image.png)\n\n[Anamaria Todor](https://www.linkedin.com/in/anatodor/) is a Senior Solutions Architect based in Copenhagen, Denmark. She saw her first computer when she was 4 years old and never let computer science and engineering go ever since. She has worked in various technical roles from full-stack developer, to data engineer, technical lead, and CTO at various Danish companies. Anamaria has a bachelor’s degree in Applied Engineering and Computer Science, a master’s degree in Computer Science, and over 10 years of hands-on AWS experience. At AWS, she works closely with healthcare and life sciences companies in the enterprise segment. When she’s not working or playing video games, she’s coaching girls and female professionals in understanding and finding their path through technology.","render":"In a previous <a href=\"https://aws.amazon.com/de/blogs/machine-learning/analyzing-and-tagging-assets-stored-in-veeva-vault-promomats-using-amazon-ai-services/\" target=\"_blank\">post</a>, we talked about analyzing and tagging assets stored in Veeva Vault PromoMats using Amazon AI services and the Veeva Vault Platform’s APIs. In this post, we explore how to use <a href=\"https://aws.amazon.com/appflow/\" target=\"_blank\">Amazon AppFlow</a>, a fully managed integration service that enables you to securely transfer data from software as a service (SaaS) applications like Veeva Vault to AWS. <a href=\"https://docs.aws.amazon.com/appflow/latest/userguide/veeva.html\" target=\"_blank\">The Amazon AppFlow Veeva</a> connector allows you to connect your AWS environment to the Veeva ecosystem quickly, reliably, and cost-effectively in order to analyze the rich content stored in Veeva Vault at scale.\nThe Amazon AppFlow Veeva connector is the first Amazon AppFlow connector supporting automatic transfer of <a href=\"https://aws.amazon.com/about-aws/whats-new/2021/06/amazon-appflow-now-supports-documents-with-veeva/\" target=\"_blank\">Veeva documents</a>. It allows you to choose between the latest version (the Steady State version in Veeva terms) and all versions of documents. Moreover, you can import document metadata.\nWith a few clicks, you can easily set up a managed connection and choose the Veeva Vault documents and metadata to import. You can further adjust the import behavior by mapping source fields to destination fields. You can also add filters based on document type and subtype, classification, products, country, site, and more. Lastly, you can add validation and manage on-demand and scheduled flow triggers.\nYou can use the Amazon AppFlow Veeva connector for various use cases, ranging from Veeva Vault PromoMats to other Veeva Vault solutions such as QualityDocs, eTMF, or Regulatory Information Management (RIM). The following are some of the use cases where you can use the connector:\n<ul>\n<li>Data synchronization – You can use the connector in the process of establishing consistency and harmonization between data from a source Veeva Vault and any downstream systems over time. For example, you can share Veeva PromoMats marketing assets to Salesforce. You could also use the connector to share Veeva QualityDocs like Standard Operating Procedures (SOPs) or specifications to cached websites that are searchable from tablets present on the manufacturing floor.</li>\n<li>Anomaly detection – You can share Veeva PromoMats documents to <a href=\"https://aws.amazon.com/lookout-for-metrics/\" target=\"_blank\">Amazon Lookout for Metrics</a> for anomaly detection. You can also use the connector with Vault RIM in artwork, commercial labels, templates, or patient leaflets before importing them for printing into enterprise labeling solutions such as Loftware.</li>\n<li>Data lake hydration – The connector can be an effective tool for replicating structured or unstructured data into data lakes, in order to support the creation and hydration of data lakes. For example, you can use the connector to extract standardized study information from protocols stored in Vault RIM and expose it downstream to medical analytics insight teams.</li>\n<li>Translations – The connector can be useful in sending artwork, clinical documents, marketing materials, or study protocols for translation in native languages to departments such as packaging, clinical trials, or regulatory submissions.</li>\n</ul>\nThis post focuses on how you can use <a href=\"https://aws.amazon.com/machine-learning/ai-services/\" target=\"_blank\">Amazon AI services</a> in combination with Amazon AppFlow to analyze content stored in Veeva Vault PromoMats, automatically extract tag information, and ultimately feed this information back into the Veeva Vault system. The post discusses the overall architecture, the steps to deploy a solution and dashboard, and a use case of asset metadata tagging. For more information about the proof of concept code base for this use case, see the <a href=\"https://github.com/aws-samples/aws-ai-appflow-veeva-integration\" target=\"_blank\">GitHub repository</a>.\n<h3><a id=\"Solution_overview_15\"></a>Solution overview</h3>\nThe following diagram illustrates the updated solution architecture.\n<img src=\"https://dev-media.amazoncloud.cn/cd4eb27dc3cd41a1880886d280f6c8c8_image.png\" alt=\"image.png\" />\nPreviously, in order to import assets from Veeva Vault, you had to write your own custom code logic using the <a href=\"https://developer.veevavault.com/api/21.3/\" target=\"_blank\">Veeva Vault APIs</a> to poll for changes and import the data into <a href=\"http://aws.amazon.com/s3\" target=\"_blank\">Amazon Simple Storage Service</a> (Amazon S3). This could be a manual, time-consuming process, in which you had to account for API limitations, failures, and retries, as well as scalability to accommodate an unlimited amount of assets. The updated solution uses Amazon AppFlow to abstract away the complexity of maintaining a custom Veeva to Amazon S3 data import pipeline.\nAs mentioned in the introduction, Amazon AppFlow is an easy-to-use, no-code self-service tool that uses point-and-click configurations to move data easily and securely between various SaaS applications and AWS services. AppFlow allows you to pull data (objects and documents) from supported sources and write that data to various supported destinations. The source or destination could be a SaaS application or an AWS service such as Amazon S3, <a href=\"http://aws.amazon.com/redshift\" target=\"_blank\">Amazon Redshift</a>, or Lookout for Metrics. In addition to the no-code interface, Amazon AppFlow supports configuration via API, AWS CLI, and <a href=\"http://aws.amazon.com/cloudformation\" target=\"_blank\">AWS CloudFormation</a> interfaces.\nA flow in Amazon AppFlow describes how data is to be moved, including source details, destination details, flow trigger conditions (on demand, on event, or scheduled), and data processing tasks such as checkpointing, field validation, or masking. When triggered, Amazon AppFlow runs a flow that fetches the source data (generally through the source application’s public APIs), runs data processing tasks, and transfers processed data to the destination.\nIn this example, you deploy a preconfigured flow using a CloudFormation template. The following screenshot shows the preconfigured <code>veeva-aws-connector</code> flow that is created automatically by the solution template on the Amazon AppFlow console.\n<img src=\"https://dev-media.amazoncloud.cn/78663a23d2864a2787539c6559944440_image.png\" alt=\"image.png\" />\nThe flow uses Veeva as a source and is configured to import Veeva Vault component objects. Both the metadata and source files are necessary in order to keep track of the assets that have been processed and push tags back on the correct corresponding asset in the source system. In this situation, only the latest version is being imported, and renditions aren’t included.\n<img src=\"https://dev-media.amazoncloud.cn/a23da3d5fa574aa9a4f70cf27e86e77e_image.png\" alt=\"image.png\" />\nThe flow’s destination also needs to be configured. In the following screenshot, we define a file format and folder structure for the S3 bucket that was created as part of the CloudFormation template.\n<img src=\"https://dev-media.amazoncloud.cn/d3a5c44ebf9f4b85b99efa13e6c970f6_image.png\" alt=\"image.png\" />\nFinally, the flow is triggered on demand for demonstration purposes. This can be modified so that the flow runs on a schedule, with a maximum granularity of 1 minute. When triggered on a schedule, the transfer mode changes automatically from a full transfer to an incremental transfer mode. You specify a source timestamp field for tracking the changes. For the tagging use case, we have found that the Last Modified Date setting is the most suitable.\n<img src=\"https://dev-media.amazoncloud.cn/ae831bba53e1453c9a4c727a0ae69ea4_image.png\" alt=\"image.png\" />\nAmazon AppFlow is then integrated with <a href=\"https://aws.amazon.com/eventbridge/\" target=\"_blank\">Amazon EventBridge</a> to publish events whenever a flow run is complete.\nFor better resiliency, the <code>AVAIAppFlowListener</code> <a href=\"http://aws.amazon.com/lambda\" target=\"_blank\">AWS Lambda</a> function is wired into EventBridge. When an Amazon AppFlow event is triggered, it verifies that the specific flow run has completed successfully, reads the metadata information of all imported assets from that specific flow run, and pushes individual document metadata into an <a href=\"https://aws.amazon.com/sqs/\" target=\"_blank\">Amazon Simple Queue Service</a> (Amazon SQS) queue. Using Amazon SQS provides a loose coupling between the producer and processor sections of the architecture and also allows you to deploy changes to the processor section without stopping the incoming updates.\nA second poller function (<code>AVAIQueuePoller</code>) reads the SQS queue at frequent intervals (every minute) and processes the incoming assets. For an even better reaction time from the Lambda function, you can replace the CloudWatch rule by configuring Amazon SQS as a trigger for the function.\nDepending on the incoming message type, the solution uses various AWS AI services to derive insights from your data. Some examples include:\n<ul>\n<li>\nText files – The function uses the <a href=\"https://docs.aws.amazon.com/comprehend/latest/dg/extracted-med-info.html\" target=\"_blank\">DetectEntities</a> operation of <a href=\"https://aws.amazon.com/comprehend/medical/\" target=\"_blank\">Amazon Comprehend Medical</a>, a natural language processing (NLP) service that makes it easy to use ML to extract relevant medical information from unstructured text. This operation detects entities in categories like <code>Anatomy</code>, <code>Medical_Condition</code>, <code>Medication</code>, <code>Protected_Health_Information</code>, and <code>Test_Treatment_Procedure</code>. The resulting output is filtered for <code>Protected_Health_Information</code>, and the remaining information, along with confidence scores, is flattened and inserted into an <a href=\"https://aws.amazon.com/dynamodb/\" target=\"_blank\">Amazon DynamoDB</a> table. This information is plotted on the OpenSearch Kibana cluster. In real-world applications, you can also use the Amazon Comprehend Medical <a href=\"https://aws.amazon.com/about-aws/whats-new/2019/12/announcing-icd-10-cm-rxnorm-ontology-linking-amazon-comprehend-medical/\" target=\"_blank\">ICD-10-CM or RxNorm</a> feature to link the detected information to medical ontologies so downstream healthcare applications can use it for further analysis.\n</li>\n<li>\n**Images **– The function uses the <a href=\"https://docs.aws.amazon.com/rekognition/latest/dg/labels-detect-labels-image.html\" target=\"_blank\">DetectLabels</a> method of <a href=\"https://aws.amazon.com/rekognition/\" target=\"_blank\">Amazon Rekognition</a> to detect labels in the incoming image. These labels can act as tags to identify the rich information buried in your images, such as information about commercial artwork and clinical labels. If labels like <code>Human</code> or <code>Person</code> are detected with a confidence score of more than 80%, the code uses the <a href=\"https://docs.aws.amazon.com/rekognition/latest/dg/faces-detect-images.html\" target=\"_blank\">DetectFaces</a> method to look for key facial features such as eyes, nose, and mouth to detect faces in the input image. Amazon Rekognition delivers all this information with an associated confidence score, which is flattened and stored in the DynamoDB table.\n</li>\n<li>\nVoice recordings – For audio assets, the code uses the <a href=\"https://docs.aws.amazon.com/transcribe/latest/dg/API_StartTranscriptionJob.html\" target=\"_blank\">StartTranscriptionJob</a> asynchronous method of <a href=\"https://aws.amazon.com/transcribe/\" target=\"_blank\">Amazon Transcribe</a> to transcribe the incoming audio to text, passing in a unique identifier as the <code>TranscriptionJobName</code>. The code assumes the audio language to be English (US), but you can modify it to tie to the information coming from Veeva Vault. The code calls the <a href=\"https://docs.aws.amazon.com/transcribe/latest/dg/API_GetTranscriptionJob.html\" target=\"_blank\">GetTranscriptionJob</a> method, passing in the same unique identifier as the <code>TranscriptionJobName</code> in a loop, until the job is complete. Amazon Transcribe delivers the output file on an S3 bucket, which is read by the code and deleted. The code calls the text processing workflow (as discussed earlier) to extract entities from transcribed audio.\n</li>\n<li>\nScanned documents (PDFs) – A large percentage of life sciences assets are represented in PDFs—these could be anything from scientific journals and research papers to drug labels. <a href=\"https://aws.amazon.com/textract/\" target=\"_blank\">Amazon Textract</a> is a service that automatically extracts text and data from scanned documents. The code uses the <a href=\"https://docs.aws.amazon.com/textract/latest/dg/async-analyzing-with-sqs.html\" target=\"_blank\">StartDocumentTextDetection</a> method to start an asynchronous job to detect text in the document. The code uses the <code>JobId</code> returned in the response to call <a href=\"https://docs.aws.amazon.com/textract/latest/dg/API_GetDocumentTextDetection.html\" target=\"_blank\">GetDocumentTextDetection</a> in a loop, until the job is complete. The output JSON structure contains lines and words of detected text, along with confidence scores for each element it identifies, so you can make informed decisions about how to use the results. The code processes the JSON structure to recreate the text blurb and calls the text processing workflow to extract entities from the text.\n</li>\n</ul>\nA DynamoDB table stores all the processed data. The solution uses <a href=\"https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.Lambda.html\" target=\"_blank\">DynamoDB Streams and Lambda triggers</a> (<code>AVAIPopulateES</code>) to populate data into an OpenSearch Kibana cluster. The AVAIPopulateES function runs for every update, insert, and delete operation that happens in the DynamoDB table, and inserts one corresponding record in the OpenSearch index. You can visualize these records using Kibana.\nTo close the feedback loop, the <code>AVAICustomFieldPopulator</code> Lambda function has been created. It’s triggered by events in the DynamoDB stream of the metadata DynamoDB table. For every <code>DocumentID</code> in the DynamoDB records, the function tries to upsert tag information into a predefined custom field property of the asset with the corresponding ID in Veeva, using the Veeva API. To avoid inserting noise into the custom field, the Lambda function filters any tags that have been identified with a confidence score lower than 0.9. Failed requests are forwarded to a dead-letter queue (DLQ) for manual inspection or automatic retry.\nThis solution offers a serverless, pay-as-you-go approach to process, tag, and enable comprehensive searches on your digital assets. Additionally, each managed component has high availability built in by automatic deployment across multiple Availability Zones. For <a href=\"https://aws.amazon.com/opensearch-service/\" target=\"_blank\">Amazon OpenSearch Service</a> (successor to Amazon Elasticsearch Service), you can choose the <a href=\"https://aws.amazon.com/blogs/database/increase-availability-for-amazon-elasticsearch-service-by-deploying-in-three-availability-zones-2/\" target=\"_blank\">three-AZ option</a> to provide better availability for your domains.\n<h3><a id=\"Prerequisites_66\"></a>Prerequisites</h3>\nFor this walkthrough, you should have the following prerequisites:\n<ul>\n<li>An <a href=\"https://signin.aws.amazon.com/signin?redirect_uri=https%3A%2F%2Fportal.aws.amazon.com%2Fbilling%2Fsignup%2Fresume&client_id=signup\" target=\"_blank\">AWS account</a> with appropriate <a href=\"http://aws.amazon.com/iam\" target=\"_blank\">AWS Identity and Access Management</a> (IAM) permissions to launch the CloudFormation template</li>\n<li>Appropriate access credentials for a Veeva Vault PromoMats domain (domain URL, user name, and password)</li>\n<li>A custom content tag defined in Veeva for the digital assets that you want to be tagged (as an example, we created the <code>AutoTags</code> custom content tag)</li>\n<li>Digital assets in the PromoMats Vault accessible to the preceding credentials</li>\n</ul>\n<h3><a id=\"Deploy_your_solution_75\"></a>Deploy your solution</h3>\nYou use a CloudFormation stack to deploy the solution. The stack creates all the necessary resources, including:\n<ul>\n<li>\nAn S3 bucket to store the incoming assets.\n</li>\n<li>\nAn Amazon AppFlow flow to automatically import assets into the S3 bucket.\n</li>\n<li>\nAn EventBridge rule and Lambda function to react to the events generated by Amazon AppFlow (<code>AVAIAppFlowListener</code>).\n</li>\n<li>\nAn SQS FIFO queue to act as a loose coupling between the listener function (<code>AVAIAppFlowListener</code>) and the poller function (<code>AVAIQueuePoller</code>).\n</li>\n<li>\nA DynamoDB table to store the output of Amazon AI services.\n</li>\n<li>\nAn Amazon OpenSearch Kibana (ELK) cluster to visualize the analyzed tags.\n</li>\n<li>\nA Lambda function to push back identified tags into Veeva (<code>AVAICustomFieldPopulator</code>), with a corresponding DLQ.\n</li>\n<li>\nRequired Lambda functions:\n</li>\n<li>\nAVAIAppFlowListener – Triggered by events pushed by Amazon AppFlow to EventBridge. Used for flow run validation and pushing a message to the SQS queue.\n</li>\n<li>\nAVAIQueuePoller – Triggered every 1 minute. Used for polling the SQS queue, processing the assets using Amazon AI services, and populating the DynamoDB table.\n</li>\n<li>\nAVAIPopulateES – Triggered when there is an update, insert, or delete on the DynamoDB table. Used for capturing changes from DynamoDB and populating the ELK cluster.\n</li>\n<li>\nAVAICustomFieldPopulator – Triggered when there is an update, insert, or delete on the DynamoDB table. Used for feeding back tag information into Veeva.\n</li>\n<li>\nThe <a href=\"http://aws.amazon.com/cloudwatch\" target=\"_blank\">Amazon CloudWatch Events</a> rules that trigger the <code>AVAIQueuePoller</code> function. These triggers are in the <code>DISABLED</code> state by default.\n</li>\n<li>\nRequired IAM roles and policies for interacting with EventBridge and the AI services in a scoped-down manner.\n</li>\n</ul>\nTo get started, complete the following steps:\n<ol>\n<li>Sign in to the <a href=\"http://aws.amazon.com/console\" target=\"_blank\">AWS Management Console</a> with an account that has the prerequisite IAM permissions.</li>\n<li>Choose <a href=\"\" target=\"_blank\">Launch Stack</a> and open it on a new tab: \n<img src=\"https://dev-media.amazoncloud.cn/c4147b6ac22d4b3f89a2eca3275fdc36_image.png\" alt=\"image.png\" /></li>\n<li>On the Create stack page, choose Next.</li>\n</ol>\n<img src=\"https://dev-media.amazoncloud.cn/459c6f380d5b4ae5ba07c864d1075cf3_image.png\" alt=\"image.png\" />\n<ol start=\"4\">\n<li>On the Specify stack details page, enter a name for the stack.</li>\n<li>Enter values for the parameters.</li>\n<li>Choose Next.</li>\n</ol>\n<img src=\"https://dev-media.amazoncloud.cn/afe97497cafe473783e8d5b92edb1a17_image.png\" alt=\"image.png\" />\n<ol start=\"7\">\n<li>On the Configure stack options page, leave everything as the default and choose Next.</li>\n</ol>\n<img src=\"https://dev-media.amazoncloud.cn/b6f65a3980ce48869a0c7e910c09a036_image.png\" alt=\"image.png\" />\n<ol start=\"8\">\n<li>On the **Review **page, in the Capabilities and transforms section, select the three check boxes.</li>\n<li>Choose Create stack.</li>\n</ol>\n<img src=\"https://dev-media.amazoncloud.cn/c3cff2f77dc643009b22a01346ad8786_image.png\" alt=\"image.png\" />\n<ol start=\"10\">\n<li>Wait for the stack to complete. You can examine various events from the stack creation process on the **Events **tab.</li>\n<li>After the stack creation is complete, you can look on the **Resources **tab to see all the resources the CloudFormation template created.</li>\n<li>On the Outputs tab, copy the value of <code>ESDomainAccessPrincipal</code>.</li>\n</ol>\nThis is the ARN of the IAM role that the <code>AVAIPopulateES</code> function assumes. You use it later to configure access to the Amazon OpenSearch Service domain.\n<h3><a id=\"Set_up_Amazon_OpenSearch_Service_and_Kibana_125\"></a>Set up Amazon OpenSearch Service and Kibana</h3>\nThis section walks you through securing your Amazon OpenSearch Service cluster and installing a local proxy to access Kibana securely.\n<ol>\n<li>On the Amazon OpenSearch Service console, select the domain that was created by the template.</li>\n<li>On the Actions menu, choose Modify access policy.</li>\n</ol>\n<img src=\"https://dev-media.amazoncloud.cn/88a8a3c804ef41a0b1bb7b068e6d880c_image.png\" alt=\"image.png\" />\n<ol start=\"3\">\n<li>For Domain access policy, choose Custom access policy.</li>\n</ol>\n<img src=\"https://dev-media.amazoncloud.cn/de820077e86b4ebbb637a09623897af4_image.png\" alt=\"image.png\" />\n4.In the Access policy will be cleared pop-up window, choose Clear and continue.\n<img src=\"https://dev-media.amazoncloud.cn/7e7403344b7249ac8da16b5a7240821e_image.png\" alt=\"image.png\" />\n<ol start=\"5\">\n<li>On the next page, configure the following statements to lock down access to the Amazon OpenSearch Service domain:</li>\n</ol>\na. Allow IPv4 address – Your IP address. \nb. Allow IAM ARN – The value of <code>ESDomainAccessPrincipal</code> you copied earlier.\n<img src=\"https://dev-media.amazoncloud.cn/bd7401d747f04789bed4ecf3ff3b6833_image.png\" alt=\"image.png\" />\n<ol start=\"6\">\n<li>Choose Submit.</li>\n</ol>\nThis creates an access policy that grants access to the AVAIPopulateES function and Kibana access from your IP address. For more information about scoping down your access policy, see C<a href=\"https://docs.aws.amazon.com/opensearch-service/latest/developerguide/createupdatedomains.html#createdomain-configure-access-policies\" target=\"_blank\">onfiguring access policies</a>.\n<ol start=\"7\">\n<li>Wait for the domain status to show as <code>Active</code>.</li>\n<li>On the Amazon EventBridge console, under Events, choose Rules. You can see two rules that the CloudFormation template created.</li>\n<li>Select the <code>AVAIQueuePollerSchedule</code> rule and enable it by clicking Enable.</li>\n</ol>\n<img src=\"https://dev-media.amazoncloud.cn/b881c244095549e09f74c0f93934eeec_image.png\" alt=\"image.png\" />\nIn 5–8 minutes, the data should start flowing in and entities are created in the Amazon OpenSearch Service cluster. You can now visualize these entities in Kibana. To do this, you use an open-source proxy called <a href=\"https://github.com/santthosh/aws-es-kibana\" target=\"_blank\">aws-es-kibana</a>. To install the proxy on your computer, enter the following code:\n<pre><code class=\"lang-\">aws-es-kibana your_OpenSearch_domain_endpoint\n</code></pre>\nYou can find the domain endpoint on the **Outputs **tab of the CloudFormation stack under <code>ESDomainEndPoint</code>. You should see the following output:\n<img src=\"https://dev-media.amazoncloud.cn/881ce068b2cc4cc8affc8341995d9542_image.png\" alt=\"image.png\" />\n<h3><a id=\"Create_visualizations_and_analyze_tagged_content_169\"></a>Create visualizations and analyze tagged content</h3>\nPlease refer to the original <a href=\"https://aws.amazon.com/de/blogs/machine-learning/analyzing-and-tagging-assets-stored-in-veeva-vault-promomats-using-amazon-ai-services/\" target=\"_blank\">blogpost</a>.\n<h4><a id=\"Clean_up_173\"></a>Clean up</h4>\nTo avoid incurring future charges, delete the resources when not in use. You can easily delete all resources by deleting the associated CloudFormation stack. Note that you need to empty the created S3 buckets of content in order for the deletion of the stack to be successful.\n<h3><a id=\"Conclusion_177\"></a>Conclusion</h3>\nIn this post, we demonstrated how you can use Amazon AI services in combination with Amazon AppFlow to extend the functionality of Veeva Vault PromoMats and extract valuable information quickly and easily. The built-in loop back mechanism allows you to update the tags back into Veeva Vault and enable auto-tagging of your assets. This makes it easier for your team to find and locate assets quickly.\nAlthough no ML output is perfect, it can come very close to human performance and help offset a substantial portion of your team’s efforts. You can use this additional capacity towards value-added tasks, while dedicating a small capacity to check the output of the ML solution. This solution can also help optimize costs, achieve tagging consistency, and enable quick discovery of existing assets.\nFinally, you can maintain ownership of your data and choose which AWS services can process, store, and host the content. AWS doesn’t access or use your content for any purpose without your consent, and never uses customer data to derive information for marketing or advertising. For more information, see <a href=\"https://aws.amazon.com/compliance/data-privacy-faq/\" target=\"_blank\">Data Privacy FAQ</a>.\nYou can also extend the functionality of this solution further with additional enhancements. For example, in addition to the AI and ML services in this post, you can easily add any of your custom ML models built using <a href=\"https://aws.amazon.com/sagemaker/\" target=\"_blank\">Amazon SageMaker</a> to the architecture.\nIf you’re interested in exploring additional use cases for Veeva and AWS, please reach out to your AWS account team.\nVeeva Systems has reviewed and approved this content. For additional Veeva Vault-related questions, please contact <a href=\"https://support.veeva.com/hc/en-us\" target=\"_blank\">Veeva support</a>.\n<h4><a id=\"About_the_authors_191\"></a>About the authors</h4>\n<img src=\"https://dev-media.amazoncloud.cn/d7bca1edce2f4960bb7dae642c5c4adc_image.png\" alt=\"image.png\" />\n<a href=\"https://www.linkedin.com/in/thakkarmayank\" target=\"_blank\">Mayank Thakkar</a> is Head of AI/ML Business Development, Global Healthcare and Life Sciences at AWS. He has more than 18 years of experience in varied industries like healthcare, life sciences, insurance, and retail, specializing in building serverless, artificial intelligence, and machine learning-based solutions to solve real-world industry problems. At AWS, he works closely with big pharma companies around the world to build cutting-edge solutions and help them along their cloud journey. Apart from work, Mayank, along with his wife, is busy raising two energetic and mischievous boys, Aaryan (6) and Kiaan (4), while trying to keep the house from burning down or getting flooded!\n<img src=\"https://dev-media.amazoncloud.cn/107f2cb7f08f4d84b2fa6dda18443c09_image.png\" alt=\"image.png\" />\n<a href=\"https://www.linkedin.com/in/anatodor/\" target=\"_blank\">Anamaria Todor</a> is a Senior Solutions Architect based in Copenhagen, Denmark. She saw her first computer when she was 4 years old and never let computer science and engineering go ever since. She has worked in various technical roles from full-stack developer, to data engineer, technical lead, and CTO at various Danish companies. Anamaria has a bachelor’s degree in Applied Engineering and Computer Science, a master’s degree in Computer Science, and over 10 years of hands-on AWS experience. At AWS, she works closely with healthcare and life sciences companies in the enterprise segment. When she’s not working or playing video games, she’s coaching girls and female professionals in understanding and finding their path through technology.\n"}

亚马逊云科技解决方案基于行业客户应用场景及技术领域的解决方案

联系亚马逊云科技专家