基于 Nginx&Lua 实现自建服务端埋点系统

0
0
{"value":"\n\n#### **前言**\n\n\n埋点数据一般取决于服务提供商想从用户身上获取什么信息。通常来讲,主要分为用户的基本属性信息和行为信息。用户的基本属性信息主要包括:年龄、性别、设备等。行为信息即用户的点击行为和浏览行为,在什么时间,哪个用户点击了哪个按钮,浏览了哪个页面,浏览时长等等的数据。\n基本属性信息和行为信息又可以称之为一个简单的报文。报文是网络中交换与传输的数据单元,即站点一次性要发送的数据块。报文包含了将要发送的完整的数据信息,其长短很不一致,长度不限且可变。简单来说就是用户在 App 内有一个操作行为,就会上报一组带有数据的字段。\n\n本文会演示如何利用开源软件和 Amazon 服务来构建服务端埋点系统,客户端部分不在本文的讨论范围内。\n\n\n#### **软件架构**\n\n\nLua 是一种轻量级、可嵌入式的脚本语言,可以非常容易的嵌入到其他语言中使用。另外 Lua 提供了协程并发,即以同步调用的方式进行异步执行,从而实现并发,比起回调机制的并发来说代码更容易编写和理解,排查问题也会容易。Lua 还提供了闭包机制,函数可以作为 First Class Value 进行参数传递,另外其实现了标记清除垃圾收集。\n\nOpenResty® 是一个基于 Nginx 与 Lua 的高性能 Web 平台,其内部集成了大量精良的 Lua 库、第三方模块以及大多数的依赖项。用于方便地搭建能够处理超高并发、扩展性极高的动态 Web 应用、Web 服务和动态网关。OpenResty® 通过汇聚各种设计精良的 Nginx 模块,从而将 Nginx 有效地变成一个强大的通用 Web 应用平台。这样,Web 开发人员和系统工程师可以使用 Lua 脚本语言调动 Nginx 支持的各种 C 以及 Lua 模块,快速构造出足以胜任 10K 乃至 1000K 以上单机并发连接的高性能 Web 应用系统。\n\n通常会利用 Nginx&Lua 实现服务端日志的统一收集,笔者会利用这项技术实现埋点数据的收集,思路如下:\n\n- 以 Http 请求的方式将埋点数据发送至 Nginx 端;\n- 通过 Lua 模块解析请求体,再将埋点数据以异步的方式发送至后端 Kafka。这个过程中数据不用落盘,大大节约了存储空间和提高了效率;\n- 最终后端会有一组消费者(例如 Spark)从 Kafka 中将数据落盘(例如 S3);\n\n下图是本文软件层面的架构图。\n\n![image.png](https://dev-media.amazoncloud.cn/d90eca5f50fb4970a0afb617309b5d4e_image.png)\n\n\n#### **整体架构**\n\n\n![image.png](https://dev-media.amazoncloud.cn/b54161adda0c4a26a2ac75b5a653fb0a_image.png)\n\n架构分为四大块:\n\n- Amazon EKS,本文会将 Nginx 和 Lua 以容器的形式部署在 Amazon EKS 中,充分利用 EKS 的弹性;\n- Amazon MSK,本文会使用托管的 Kafka,也就是 Amazon MSK,降低部署和后期运维的压力;\n- Amazon EFS,考虑到整体架构的可用性和持久性,如果在 MSK 端发生了故障,虽然概率极低,本文会使用 Amazon EFS 来存储 Nginx 的错误日志,尽量保证消息的完整性;\n- Amazon NLB,本文会使用 NLB 来暴露服务;\n- Amazon ECR,本文会使用 ECR 来存储容器镜像;\n\n\n#### **步骤**\n\n\n在开始之前,请先确保您具有登录 Amazon 全球区控制台的账号,并具备管理员权限。\n\n\n#### **前提条件**\n\n\n- 一台 Linux 终端\n- 足够的 Amazon 账号权限\n- 安装 Amazon CLI\n- 安装 Docker\n- 安装 eksctl\n- 安装 kubectl\n\n\n##### **一、创建 Amazon VPC和Security Group**\n\n\n参考此[链接](https://docs.aws.amazon.com/zh_cn/vpc/latest/userguide/working-with-vpcs.html#create-vpc-and-other-resources),创建 1个 VPC、3个公有子网、3个私有子网和其他 VPC 资源。接下来笔者会使用 *vpcid, publicsubnetid01, publicsubnetid02, publicsubnetid03, privatesubnetid01, privatesubnetid02, privatesubnetid03*来代替相关 VPC 和子网资源。\n\n创建一个安全组供后续其他服务使用,为了简便配置,笔者会将本文使用到的资源放入同一个安全组中。读者可以在自己环境中将安全组进行拆分。\n\n```\n###\n$ aws ec2 create-security-group --group-name my-poc-sg --description \" my-poc-sg \" --vpc-id vpcid\n###\n```\n\n记录 Security Group ID, 笔者会使用 *securitygroupid* 来代替它。\n\n```\n###\n$ aws ec2 authorize-security-group-ingress \\\n --group-id securitygroupid\\\n --protocol all\\\n --port -1 \\\n --source-group securitygroupid\n###\n```\n\n\n##### **二、创建 Amazon MSK**\n\n\n创建 Amazon MSK 集群来接收消息,并记录 Broker 地址,笔者会使用 *broker01, broker02, broker03* 来代替。\n\n```\n###\n$ aws kafka create-cluster \\\n --cluster-name \"my-poc-msk-cluster\" \\\n --broker-node-group-info file://brokernodegroupinfo.json \\\n --kafka-version \"2.6.2\" \\\n --number-of-broker-nodes 3 \\\n --encryption-info EncryptionInTransit={ClientBroker=TLS_PLAINTEXT}\n###\n\nbrokernodegroupinfo.json\n~~~\n{\n \"InstanceType\": \"kafka.m5.large\",\n \"BrokerAZDistribution\": \"DEFAULT\",\n \"ClientSubnets\": [\n \"privatesubnetid01\",\n \"privatesubnetid02\",\n \"privatesubnetid03\"\n ],\n \"SecurityGroups\": [\n \"securitygroupid \"\n ],\n \"StorageInfo\": {\n \"EbsStorageInfo\": {\n \"VolumeSize\": 100\n }\n }\n}\n~~~\n```\n\n\n##### **三、构建镜像**\n\n\n使用到的文件包含:\n\n- Dockerfile\n- sh\n- conf\n- conf\n- my-poc-send2kafka.lua\n\n```\n###\n$ mkdir workdir\n$ cd workdir\n###\n```\n\n依次按照如下内容创建文件。\n\n```\nDockerfile\n~~~\nFROM amazonlinux\nCOPY install.sh /\nRUN chmod +x /install.sh\nRUN /install.sh\nCOPY nginx.conf /opt/openresty/nginx/conf/nginx.conf\nCOPY common.conf /opt/openresty/nginx/conf/conf.d/common.conf\nCOPY my-poc-send2kafka.lua /opt/openresty/nginx/lua/my-poc-send2kafka.lua\nEXPOSE 80\nCMD sed -i \"s/\\$mypodip/$(hostname -i)/g\" /opt/openresty/nginx/conf/conf.d/common.conf && /opt/openresty/nginx/sbin/nginx -c /opt/openresty/nginx/conf/nginx.conf\n~~~\n\ninstall.sh\n~~~\n#!/bin/sh\nyum -y install readline-devel pcre-devel openssl-devel gcc wget tar gzip perl make unzip hostname\nmkdir /opt/software\nmkdir /opt/module\ncd /opt/software/\nwget https://openresty.org/download/openresty-1.9.7.4.tar.gz\ntar -xzf openresty-1.9.7.4.tar.gz -C /opt/module/\ncd /opt/module/openresty-1.9.7.4\n./configure --prefix=/opt/openresty \\\n--with-luajit \\\n--without-http_redis2_module \\\n--with-http_iconv_module\nmake\nmake install\ncd /opt/software/\nwget https://github.com/doujiang24/lua-resty-kafka/archive/master.zip\nunzip master.zip -d /opt/module/\ncp -rf /opt/module/lua-resty-kafka-master/lib/resty/kafka/ /opt/openresty/lualib/resty/\nmkdir /opt/openresty/nginx/lua/\nmkdir /var/log/nginx/\n~~~\n\nnginx.conf\n~~~\nworker_processes auto;\nworker_rlimit_nofile 100000;\ndaemon off;\n\nevents {\n worker_connections 102400;\n multi_accept on;\n use epoll;\n}\n\nhttp {\n include mime.types;\n default_type application/octet-stream;\n log_format main '$remote_addr - $remote_user [$time_local] \"$request\" '\n '$status $body_bytes_sent \"$http_referer\" '\n '\"$http_user_agent\" \"$http_x_forwarded_for\"';\n access_log /var/log/nginx/access.log main;\n resolver 8.8.8.8;\n #resolver 127.0.0.1 valid=3600s;\n sendfile on;\n keepalive_timeout 65;\n underscores_in_headers on;\n gzip on;\n include /opt/openresty/nginx/conf/conf.d/common.conf;\n}\n~~~\n\ncommon.conf\n~~~\nlua_package_path \"/opt/openresty/lualib/resty/kafka/?.lua;;\";\nlua_package_cpath \"/opt/openresty/lualib/?.so;;\";\nlua_shared_dict ngx_cache 128m;\nlua_shared_dict cache_lock 100k;\n\nserver {\n listen 80;\n server_name 127.0.0.1;\n root html;\n lua_need_request_body on;\n access_log /var/log/nginx/access.log main;\n error_log /var/log/nginx/error-$mypodip.log notice;\n location = /putmessage {\n lua_code_cache on;\n charset utf-8;\n default_type 'application/json';\n content_by_lua_file \"/opt/openresty/nginx/lua/my-poc-send2kafka.lua\";\n }\n}\n~~~\n\nmy-poc-send2kafka.lua\n~~~\nlocal producer = require(\"resty.kafka.producer\")\nlocal json = require(\"cjson\")\n\nlocal broker_list = {\n {host = \"broker01\", port = 9092},\n {host = \"broker02\", port = 9092},\n {host = \"broker03\", port = 9092}\n}\n\nlocal log_json = {}\n\nlog_json[\"body\"] = ngx.req.read_body()\nlog_json[\"body_data\"] = ngx.req.get_body_data()\n\nlocal topic = ngx.req.get_headers()[\"topic\"]\n\n\nlocal producer_error_handle = function(topic, partition_id, queue, index, err, retryable)\n ngx.log(ngx.ERR, \"Error handle: index \", index, ' partition_id ', partition_id, ' retryable ', retryable, ' json ', json.encode(queue))\nend\n\n\nlocal bp = producer:new(broker_list, { producer_type = \"async\", batch_num = 200, error_handle = producer_error_handle})\n\nlocal sendMsg = json.encode(log_json)\n\nlocal ok, err = bp:send(topic, nil, sendMsg)\n~~~\n\n###\n$ docker build -t my-poc-image .\n###\n```\n\n\n##### **四、创建 Amazon ECR 并上传镜像**\n\n\n```\naws ecr create-repository \\\n --repository-name my-poc-ecr/nginx-lua\n\n###\n$ aws ecr get-login-password --region regioncode | docker login --username AWS --password-stdin youraccountid.dkr.ecr.regioncode.amazonaws.com\n$ docker tag my-poc-image:latest youraccountid.dkr.ecr.regioncode.amazonaws.com/my-poc-ecr/nginx-lua:latest\n$ docker push youraccountid.dkr.ecr.regioncode.amazonaws.com/my-poc-ecr/nginx-lua:latest\n###\n```\n\n\n##### **五、创建 Amazon EFS**\n\n\n```\n###\naws efs create-file-system \\\n --performance-mode generalPurpose \\\n --throughput-mode bursting \\\n --encrypted \\\n --region ap-northeast-1 \\\n --tags Key=Name,Value=my-poc-efs\n###\n```\n\n记录 FileSystemId,笔者会使用 *fsid* 来代替它。\n\n```\n###\naws efs create-mount-target \\\n --file-system-id fsid \\\n --subnet-id privatesubnetid01 \\\n --security-groups securitygroupid\n\naws efs create-mount-target \\\n --file-system-id fsid \\\n --subnet-id privatesubnetid02 \\\n --security-groups securitygroupid\n\naws efs create-mount-target \\\n --file-system-id fsid \\\n --subnet-id privatesubnetid03 \\\n --security-groups securitygroupid\n###\n```\n\n\n##### **六、创建Amazon EKS集群并安装组件**\n\n\n```\ncluster.yaml\n###\napiVersion: eksctl.io/v1alpha5\nkind: ClusterConfig\n\nmetadata:\n name: my-poc-eks-cluster\n region: ap-northeast-1\n version: \"1.21\"\n\niam:\n withOIDC: true\n\naddons:\n- name: vpc-cni\n version: v1.11.2-eksbuild.1\n- name: coredns\n version: v1.8.4-eksbuild.1\n- name: kube-proxy\n version: v1.21.2-eksbuild.2\n\nvpc:\n subnets:\n private:\n ap-northeast-1a: { id: \"privatesubnetid01\" }\n ap-northeast-1c: { id: \"privatesubnetid02\" }\n ap-northeast-1d: { id: \"privatesubnetid03\" }\n\nmanagedNodeGroups:\n - name: my-poc-ng-1\n instanceType: m5.large\n desiredCapacity: 2\n volumeSize: 100\n privateNetworking: true\n###\n\n###\n$ eksctl create cluster -f cluster.yaml\n###\n```\n\n参考此[链接](https://docs.aws.amazon.com/zh_cn/eks/latest/userguide/aws-load-balancer-controller.html)安装 Amazon Load Balancer Controller。\n\n参考此[链接](https://docs.aws.amazon.com/zh_cn/eks/latest/userguide/aws-load-balancer-controller.html)安装 Amazon EFS CSI driver。\n\n\n##### **七、在 Amazon EKS 中部署**\n\n\n```\n###\n$ kubectl apply -f deploy.yaml\n###\n\ndeploy.yaml\n~~~\n---\nkind: StorageClass\napiVersion: storage.k8s.io/v1\nmetadata:\n name: efs-sc\nprovisioner: efs.csi.aws.com\nparameters:\n provisioningMode: efs-ap\n fileSystemId: fsid\n directoryPerms: \"700\"\n gidRangeStart: \"1000\"\n gidRangeEnd: \"2000\"\n basePath: \"/dynamic_provisioning\"\n---\napiVersion: v1\nkind: PersistentVolumeClaim\nmetadata:\n name: my-poc-pvc\nspec:\n accessModes:\n - ReadWriteMany\n storageClassName: efs-sc\n resources:\n requests:\n storage: 5Gi\n---\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: nginx-lua\n labels:\n app: nginx-lua\nspec:\n replicas: 2\n selector:\n matchLabels:\n app: nginx-lua\n template:\n metadata:\n labels:\n app: nginx-lua\n spec:\n containers:\n - name: nginx-lua\n image: youraccountid.dkr.ecr.regioncode.amazonaws.com/my-poc-ecr/nginx-lua:latest\n ports:\n - containerPort: 80\n resources:\n limits:\n cpu: 500m\n requests:\n cpu: 200m\n volumeMounts:\n - name: my-poc-volume\n mountPath: /var/log/nginx\n volumes:\n - name: my-poc-volume\n persistentVolumeClaim:\n claimName: my-poc-pvc\n---\napiVersion: v1\nkind: Service\nmetadata:\n name: nginx-lua-svc\n annotations:\n service.beta.kubernetes.io/aws-load-balancer-type: external\n service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip\n service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing\n service.beta.kubernetes.io/aws-load-balancer-subnets: publicsubnetid01, publicsubnetid02, publicsubnetid03\n service.beta.kubernetes.io/aws-load-balancer-name: my-poc-nginx-lua\n service.beta.kubernetes.io/aws-load-balancer-attributes: load_balancing.cross_zone.enabled=true\nspec:\n selector:\n app: nginx-lua\n type: LoadBalancer\n ports:\n - protocol: TCP\n port: 80\n targetPort: 80\n~~~\n```\n\n使用以下命令获取 EXTERNAL-IP 地址,笔者会使用 nlbdns 来代替。\n\n```\n###\n$ kubectl get svc nginx-lua-svc\n###\n```\n\n\n##### **八、功能测试**\n\n\n参考此[链接](https://docs.aws.amazon.com/zh_cn/msk/latest/developerguide/create-topic.html)安装 kafka 客户端。\n\n创建测试用 Topic。\n\n```\n###\n$ ./kafka-topics.sh --topic my-poc-topic-01 --create --bootstrap-server broker01:9092, broker02:9092, broker03:9092 --partitions 3 --replication-factor 2\n###\n```\n\n安装 ApacheBench 测试工具,并进行测试。\n\n```\n###\n$ sudo yum -y install httpd-tools\n$ ab -T 'application/json' -H \"topic: my-poc-topic-01\" -n 1000 -p postdata.json http://nlbdns/putmessage\n###\n\npostdata.json\n~~~\n{\n\t\"uuid\": \"2b8c376e-bd20-11e6-9ebf-525499b45be6\",\n\t\"event_time\": \"2016-12-08T18:08:12\",\n\t\"page\": \"www.example.com/poster.html\",\n\t\"element\": \"register\",\n\t\"attrs\": \n\t{\n\t\t\"title\": \"test\",\n\t\t\"user_id\": 1234\n\t}\n}\n~~~\n```\n\n查看消息是否可以成功消费。\n\n```\n###\n./kafka-console-consumer.sh --bootstrap-server broker01:9092, broker02:9092, broker03:9092 --topic my-poc-topic-01 --from-beginning\n###\n```\n\n消息已经成功消费。\n\n![image.png](https://dev-media.amazoncloud.cn/bf127acabef64bfc88b20f3636294680_image.png)\n\n接下来笔者会给一个不存在的 topic 发送消息,用来模拟生产环境中后端 MSK 不可用的情况。\n\n```\n###\n$ ab -T 'application/json' -H \"topic: my-poc-topic-noexist\" -n 1000 -p postdata.json http://nlbdns/putmessage\n###\n```\n\n按照预想状况,这部分消息会以错误日志的形式保留在 Amazon EFS 中。\n\n```\n###\n$ ab -T 'application/json' -H \"topic: my-poc-topic-noexist\" -n 1000 -p postdata.json http://nlbdns/putmessage\n###\n```\n\n进入 EFS 中,打开带有 pod ip 的错误日志,可以看到错误信息被记录了下来。\n\n![image.png](https://dev-media.amazoncloud.cn/b830d6f0a1c144dda316b1ac36f6c0a8_image.png)\n\n\n#### **本篇作者**\n\n\n![image.png](https://dev-media.amazoncloud.cn/85ab0851ac994575b7021ea2b5cad532_image.png)\n\n\n#### **杨探**\n\n\n亚马逊云科技解决方案架构师,负责互联网行业云端架构咨询和设计。","render":"<h4><a id=\"_2\"></a><strong>前言</strong></h4>\n<p>埋点数据一般取决于服务提供商想从用户身上获取什么信息。通常来讲,主要分为用户的基本属性信息和行为信息。用户的基本属性信息主要包括:年龄、性别、设备等。行为信息即用户的点击行为和浏览行为,在什么时间,哪个用户点击了哪个按钮,浏览了哪个页面,浏览时长等等的数据。<br />\n基本属性信息和行为信息又可以称之为一个简单的报文。报文是网络中交换与传输的数据单元,即站点一次性要发送的数据块。报文包含了将要发送的完整的数据信息,其长短很不一致,长度不限且可变。简单来说就是用户在 App 内有一个操作行为,就会上报一组带有数据的字段。</p>\n<p>本文会演示如何利用开源软件和 Amazon 服务来构建服务端埋点系统,客户端部分不在本文的讨论范围内。</p>\n<h4><a id=\"_11\"></a><strong>软件架构</strong></h4>\n<p>Lua 是一种轻量级、可嵌入式的脚本语言,可以非常容易的嵌入到其他语言中使用。另外 Lua 提供了协程并发,即以同步调用的方式进行异步执行,从而实现并发,比起回调机制的并发来说代码更容易编写和理解,排查问题也会容易。Lua 还提供了闭包机制,函数可以作为 First Class Value 进行参数传递,另外其实现了标记清除垃圾收集。</p>\n<p>OpenResty® 是一个基于 Nginx 与 Lua 的高性能 Web 平台,其内部集成了大量精良的 Lua 库、第三方模块以及大多数的依赖项。用于方便地搭建能够处理超高并发、扩展性极高的动态 Web 应用、Web 服务和动态网关。OpenResty® 通过汇聚各种设计精良的 Nginx 模块,从而将 Nginx 有效地变成一个强大的通用 Web 应用平台。这样,Web 开发人员和系统工程师可以使用 Lua 脚本语言调动 Nginx 支持的各种 C 以及 Lua 模块,快速构造出足以胜任 10K 乃至 1000K 以上单机并发连接的高性能 Web 应用系统。</p>\n<p>通常会利用 Nginx&amp;Lua 实现服务端日志的统一收集,笔者会利用这项技术实现埋点数据的收集,思路如下:</p>\n<ul>\n<li>以 Http 请求的方式将埋点数据发送至 Nginx 端;</li>\n<li>通过 Lua 模块解析请求体,再将埋点数据以异步的方式发送至后端 Kafka。这个过程中数据不用落盘,大大节约了存储空间和提高了效率;</li>\n<li>最终后端会有一组消费者(例如 Spark)从 Kafka 中将数据落盘(例如 S3);</li>\n</ul>\n<p>下图是本文软件层面的架构图。</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/d90eca5f50fb4970a0afb617309b5d4e_image.png\" alt=\"image.png\" /></p>\n<h4><a id=\"_29\"></a><strong>整体架构</strong></h4>\n<p><img src=\"https://dev-media.amazoncloud.cn/b54161adda0c4a26a2ac75b5a653fb0a_image.png\" alt=\"image.png\" /></p>\n<p>架构分为四大块:</p>\n<ul>\n<li>Amazon EKS,本文会将 Nginx 和 Lua 以容器的形式部署在 Amazon EKS 中,充分利用 EKS 的弹性;</li>\n<li>Amazon MSK,本文会使用托管的 Kafka,也就是 Amazon MSK,降低部署和后期运维的压力;</li>\n<li>Amazon EFS,考虑到整体架构的可用性和持久性,如果在 MSK 端发生了故障,虽然概率极低,本文会使用 Amazon EFS 来存储 Nginx 的错误日志,尽量保证消息的完整性;</li>\n<li>Amazon NLB,本文会使用 NLB 来暴露服务;</li>\n<li>Amazon ECR,本文会使用 ECR 来存储容器镜像;</li>\n</ul>\n<h4><a id=\"_43\"></a><strong>步骤</strong></h4>\n<p>在开始之前,请先确保您具有登录 Amazon 全球区控制台的账号,并具备管理员权限。</p>\n<h4><a id=\"_49\"></a><strong>前提条件</strong></h4>\n<ul>\n<li>一台 Linux 终端</li>\n<li>足够的 Amazon 账号权限</li>\n<li>安装 Amazon CLI</li>\n<li>安装 Docker</li>\n<li>安装 eksctl</li>\n<li>安装 kubectl</li>\n</ul>\n<h5><a id=\"_Amazon_VPCSecurity_Group_60\"></a><strong>一、创建 Amazon VPC和Security Group</strong></h5>\n<p>参考此<a href=\"https://docs.aws.amazon.com/zh_cn/vpc/latest/userguide/working-with-vpcs.html#create-vpc-and-other-resources\" target=\"_blank\">链接</a>,创建 1个 VPC、3个公有子网、3个私有子网和其他 VPC 资源。接下来笔者会使用 <em>vpcid, publicsubnetid01, publicsubnetid02, publicsubnetid03, privatesubnetid01, privatesubnetid02, privatesubnetid03</em>来代替相关 VPC 和子网资源。</p>\n<p>创建一个安全组供后续其他服务使用,为了简便配置,笔者会将本文使用到的资源放入同一个安全组中。读者可以在自己环境中将安全组进行拆分。</p>\n<pre><code class=\"lang-\">###\n$ aws ec2 create-security-group --group-name my-poc-sg --description &quot; my-poc-sg &quot; --vpc-id vpcid\n###\n</code></pre>\n<p>记录 Security Group ID, 笔者会使用 <em>securitygroupid</em> 来代替它。</p>\n<pre><code class=\"lang-\">###\n$ aws ec2 authorize-security-group-ingress \\\n --group-id securitygroupid\\\n --protocol all\\\n --port -1 \\\n --source-group securitygroupid\n###\n</code></pre>\n<h5><a id=\"_Amazon_MSK_86\"></a><strong>二、创建 Amazon MSK</strong></h5>\n<p>创建 Amazon MSK 集群来接收消息,并记录 Broker 地址,笔者会使用 <em>broker01, broker02, broker03</em> 来代替。</p>\n<pre><code class=\"lang-\">###\n$ aws kafka create-cluster \\\n --cluster-name &quot;my-poc-msk-cluster&quot; \\\n --broker-node-group-info file://brokernodegroupinfo.json \\\n --kafka-version &quot;2.6.2&quot; \\\n --number-of-broker-nodes 3 \\\n --encryption-info EncryptionInTransit={ClientBroker=TLS_PLAINTEXT}\n###\n\nbrokernodegroupinfo.json\n~~~\n{\n &quot;InstanceType&quot;: &quot;kafka.m5.large&quot;,\n &quot;BrokerAZDistribution&quot;: &quot;DEFAULT&quot;,\n &quot;ClientSubnets&quot;: [\n &quot;privatesubnetid01&quot;,\n &quot;privatesubnetid02&quot;,\n &quot;privatesubnetid03&quot;\n ],\n &quot;SecurityGroups&quot;: [\n &quot;securitygroupid &quot;\n ],\n &quot;StorageInfo&quot;: {\n &quot;EbsStorageInfo&quot;: {\n &quot;VolumeSize&quot;: 100\n }\n }\n}\n~~~\n</code></pre>\n<h5><a id=\"_124\"></a><strong>三、构建镜像</strong></h5>\n<p>使用到的文件包含:</p>\n<ul>\n<li>Dockerfile</li>\n<li>sh</li>\n<li>conf</li>\n<li>conf</li>\n<li>my-poc-send2kafka.lua</li>\n</ul>\n<pre><code class=\"lang-\">###\n$ mkdir workdir\n$ cd workdir\n###\n</code></pre>\n<p>依次按照如下内容创建文件。</p>\n<pre><code class=\"lang-\">Dockerfile\n~~~\nFROM amazonlinux\nCOPY install.sh /\nRUN chmod +x /install.sh\nRUN /install.sh\nCOPY nginx.conf /opt/openresty/nginx/conf/nginx.conf\nCOPY common.conf /opt/openresty/nginx/conf/conf.d/common.conf\nCOPY my-poc-send2kafka.lua /opt/openresty/nginx/lua/my-poc-send2kafka.lua\nEXPOSE 80\nCMD sed -i &quot;s/\\$mypodip/$(hostname -i)/g&quot; /opt/openresty/nginx/conf/conf.d/common.conf &amp;&amp; /opt/openresty/nginx/sbin/nginx -c /opt/openresty/nginx/conf/nginx.conf\n~~~\n\ninstall.sh\n~~~\n#!/bin/sh\nyum -y install readline-devel pcre-devel openssl-devel gcc wget tar gzip perl make unzip hostname\nmkdir /opt/software\nmkdir /opt/module\ncd /opt/software/\nwget https://openresty.org/download/openresty-1.9.7.4.tar.gz\ntar -xzf openresty-1.9.7.4.tar.gz -C /opt/module/\ncd /opt/module/openresty-1.9.7.4\n./configure --prefix=/opt/openresty \\\n--with-luajit \\\n--without-http_redis2_module \\\n--with-http_iconv_module\nmake\nmake install\ncd /opt/software/\nwget https://github.com/doujiang24/lua-resty-kafka/archive/master.zip\nunzip master.zip -d /opt/module/\ncp -rf /opt/module/lua-resty-kafka-master/lib/resty/kafka/ /opt/openresty/lualib/resty/\nmkdir /opt/openresty/nginx/lua/\nmkdir /var/log/nginx/\n~~~\n\nnginx.conf\n~~~\nworker_processes auto;\nworker_rlimit_nofile 100000;\ndaemon off;\n\nevents {\n worker_connections 102400;\n multi_accept on;\n use epoll;\n}\n\nhttp {\n include mime.types;\n default_type application/octet-stream;\n log_format main '$remote_addr - $remote_user [$time_local] &quot;$request&quot; '\n '$status $body_bytes_sent &quot;$http_referer&quot; '\n '&quot;$http_user_agent&quot; &quot;$http_x_forwarded_for&quot;';\n access_log /var/log/nginx/access.log main;\n resolver 8.8.8.8;\n #resolver 127.0.0.1 valid=3600s;\n sendfile on;\n keepalive_timeout 65;\n underscores_in_headers on;\n gzip on;\n include /opt/openresty/nginx/conf/conf.d/common.conf;\n}\n~~~\n\ncommon.conf\n~~~\nlua_package_path &quot;/opt/openresty/lualib/resty/kafka/?.lua;;&quot;;\nlua_package_cpath &quot;/opt/openresty/lualib/?.so;;&quot;;\nlua_shared_dict ngx_cache 128m;\nlua_shared_dict cache_lock 100k;\n\nserver {\n listen 80;\n server_name 127.0.0.1;\n root html;\n lua_need_request_body on;\n access_log /var/log/nginx/access.log main;\n error_log /var/log/nginx/error-$mypodip.log notice;\n location = /putmessage {\n lua_code_cache on;\n charset utf-8;\n default_type 'application/json';\n content_by_lua_file &quot;/opt/openresty/nginx/lua/my-poc-send2kafka.lua&quot;;\n }\n}\n~~~\n\nmy-poc-send2kafka.lua\n~~~\nlocal producer = require(&quot;resty.kafka.producer&quot;)\nlocal json = require(&quot;cjson&quot;)\n\nlocal broker_list = {\n {host = &quot;broker01&quot;, port = 9092},\n {host = &quot;broker02&quot;, port = 9092},\n {host = &quot;broker03&quot;, port = 9092}\n}\n\nlocal log_json = {}\n\nlog_json[&quot;body&quot;] = ngx.req.read_body()\nlog_json[&quot;body_data&quot;] = ngx.req.get_body_data()\n\nlocal topic = ngx.req.get_headers()[&quot;topic&quot;]\n\n\nlocal producer_error_handle = function(topic, partition_id, queue, index, err, retryable)\n ngx.log(ngx.ERR, &quot;Error handle: index &quot;, index, ' partition_id ', partition_id, ' retryable ', retryable, ' json ', json.encode(queue))\nend\n\n\nlocal bp = producer:new(broker_list, { producer_type = &quot;async&quot;, batch_num = 200, error_handle = producer_error_handle})\n\nlocal sendMsg = json.encode(log_json)\n\nlocal ok, err = bp:send(topic, nil, sendMsg)\n~~~\n\n###\n$ docker build -t my-poc-image .\n###\n</code></pre>\n<h5><a id=\"_Amazon_ECR__271\"></a><strong>四、创建 Amazon ECR 并上传镜像</strong></h5>\n<pre><code class=\"lang-\">aws ecr create-repository \\\n --repository-name my-poc-ecr/nginx-lua\n\n###\n$ aws ecr get-login-password --region regioncode | docker login --username AWS --password-stdin youraccountid.dkr.ecr.regioncode.amazonaws.com\n$ docker tag my-poc-image:latest youraccountid.dkr.ecr.regioncode.amazonaws.com/my-poc-ecr/nginx-lua:latest\n$ docker push youraccountid.dkr.ecr.regioncode.amazonaws.com/my-poc-ecr/nginx-lua:latest\n###\n</code></pre>\n<h5><a id=\"_Amazon_EFS_286\"></a><strong>五、创建 Amazon EFS</strong></h5>\n<pre><code class=\"lang-\">###\naws efs create-file-system \\\n --performance-mode generalPurpose \\\n --throughput-mode bursting \\\n --encrypted \\\n --region ap-northeast-1 \\\n --tags Key=Name,Value=my-poc-efs\n###\n</code></pre>\n<p>记录 FileSystemId,笔者会使用 <em>fsid</em> 来代替它。</p>\n<pre><code class=\"lang-\">###\naws efs create-mount-target \\\n --file-system-id fsid \\\n --subnet-id privatesubnetid01 \\\n --security-groups securitygroupid\n\naws efs create-mount-target \\\n --file-system-id fsid \\\n --subnet-id privatesubnetid02 \\\n --security-groups securitygroupid\n\naws efs create-mount-target \\\n --file-system-id fsid \\\n --subnet-id privatesubnetid03 \\\n --security-groups securitygroupid\n###\n</code></pre>\n<h5><a id=\"Amazon_EKS_322\"></a><strong>六、创建Amazon EKS集群并安装组件</strong></h5>\n<pre><code class=\"lang-\">cluster.yaml\n###\napiVersion: eksctl.io/v1alpha5\nkind: ClusterConfig\n\nmetadata:\n name: my-poc-eks-cluster\n region: ap-northeast-1\n version: &quot;1.21&quot;\n\niam:\n withOIDC: true\n\naddons:\n- name: vpc-cni\n version: v1.11.2-eksbuild.1\n- name: coredns\n version: v1.8.4-eksbuild.1\n- name: kube-proxy\n version: v1.21.2-eksbuild.2\n\nvpc:\n subnets:\n private:\n ap-northeast-1a: { id: &quot;privatesubnetid01&quot; }\n ap-northeast-1c: { id: &quot;privatesubnetid02&quot; }\n ap-northeast-1d: { id: &quot;privatesubnetid03&quot; }\n\nmanagedNodeGroups:\n - name: my-poc-ng-1\n instanceType: m5.large\n desiredCapacity: 2\n volumeSize: 100\n privateNetworking: true\n###\n\n###\n$ eksctl create cluster -f cluster.yaml\n###\n</code></pre>\n<p>参考此<a href=\"https://docs.aws.amazon.com/zh_cn/eks/latest/userguide/aws-load-balancer-controller.html\" target=\"_blank\">链接</a>安装 Amazon Load Balancer Controller。</p>\n<p>参考此<a href=\"https://docs.aws.amazon.com/zh_cn/eks/latest/userguide/aws-load-balancer-controller.html\" target=\"_blank\">链接</a>安装 Amazon EFS CSI driver。</p>\n<h5><a id=\"_Amazon_EKS__372\"></a><strong>七、在 Amazon EKS 中部署</strong></h5>\n<pre><code class=\"lang-\">###\n$ kubectl apply -f deploy.yaml\n###\n\ndeploy.yaml\n~~~\n---\nkind: StorageClass\napiVersion: storage.k8s.io/v1\nmetadata:\n name: efs-sc\nprovisioner: efs.csi.aws.com\nparameters:\n provisioningMode: efs-ap\n fileSystemId: fsid\n directoryPerms: &quot;700&quot;\n gidRangeStart: &quot;1000&quot;\n gidRangeEnd: &quot;2000&quot;\n basePath: &quot;/dynamic_provisioning&quot;\n---\napiVersion: v1\nkind: PersistentVolumeClaim\nmetadata:\n name: my-poc-pvc\nspec:\n accessModes:\n - ReadWriteMany\n storageClassName: efs-sc\n resources:\n requests:\n storage: 5Gi\n---\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: nginx-lua\n labels:\n app: nginx-lua\nspec:\n replicas: 2\n selector:\n matchLabels:\n app: nginx-lua\n template:\n metadata:\n labels:\n app: nginx-lua\n spec:\n containers:\n - name: nginx-lua\n image: youraccountid.dkr.ecr.regioncode.amazonaws.com/my-poc-ecr/nginx-lua:latest\n ports:\n - containerPort: 80\n resources:\n limits:\n cpu: 500m\n requests:\n cpu: 200m\n volumeMounts:\n - name: my-poc-volume\n mountPath: /var/log/nginx\n volumes:\n - name: my-poc-volume\n persistentVolumeClaim:\n claimName: my-poc-pvc\n---\napiVersion: v1\nkind: Service\nmetadata:\n name: nginx-lua-svc\n annotations:\n service.beta.kubernetes.io/aws-load-balancer-type: external\n service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip\n service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing\n service.beta.kubernetes.io/aws-load-balancer-subnets: publicsubnetid01, publicsubnetid02, publicsubnetid03\n service.beta.kubernetes.io/aws-load-balancer-name: my-poc-nginx-lua\n service.beta.kubernetes.io/aws-load-balancer-attributes: load_balancing.cross_zone.enabled=true\nspec:\n selector:\n app: nginx-lua\n type: LoadBalancer\n ports:\n - protocol: TCP\n port: 80\n targetPort: 80\n~~~\n</code></pre>\n<p>使用以下命令获取 EXTERNAL-IP 地址,笔者会使用 nlbdns 来代替。</p>\n<pre><code class=\"lang-\">###\n$ kubectl get svc nginx-lua-svc\n###\n</code></pre>\n<h5><a id=\"_473\"></a><strong>八、功能测试</strong></h5>\n<p>参考此<a href=\"https://docs.aws.amazon.com/zh_cn/msk/latest/developerguide/create-topic.html\" target=\"_blank\">链接</a>安装 kafka 客户端。</p>\n<p>创建测试用 Topic。</p>\n<pre><code class=\"lang-\">###\n$ ./kafka-topics.sh --topic my-poc-topic-01 --create --bootstrap-server broker01:9092, broker02:9092, broker03:9092 --partitions 3 --replication-factor 2\n###\n</code></pre>\n<p>安装 ApacheBench 测试工具,并进行测试。</p>\n<pre><code class=\"lang-\">###\n$ sudo yum -y install httpd-tools\n$ ab -T 'application/json' -H &quot;topic: my-poc-topic-01&quot; -n 1000 -p postdata.json http://nlbdns/putmessage\n###\n\npostdata.json\n~~~\n{\n\t&quot;uuid&quot;: &quot;2b8c376e-bd20-11e6-9ebf-525499b45be6&quot;,\n\t&quot;event_time&quot;: &quot;2016-12-08T18:08:12&quot;,\n\t&quot;page&quot;: &quot;www.example.com/poster.html&quot;,\n\t&quot;element&quot;: &quot;register&quot;,\n\t&quot;attrs&quot;: \n\t{\n\t\t&quot;title&quot;: &quot;test&quot;,\n\t\t&quot;user_id&quot;: 1234\n\t}\n}\n~~~\n</code></pre>\n<p>查看消息是否可以成功消费。</p>\n<pre><code class=\"lang-\">###\n./kafka-console-consumer.sh --bootstrap-server broker01:9092, broker02:9092, broker03:9092 --topic my-poc-topic-01 --from-beginning\n###\n</code></pre>\n<p>消息已经成功消费。</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/bf127acabef64bfc88b20f3636294680_image.png\" alt=\"image.png\" /></p>\n<p>接下来笔者会给一个不存在的 topic 发送消息,用来模拟生产环境中后端 MSK 不可用的情况。</p>\n<pre><code class=\"lang-\">###\n$ ab -T 'application/json' -H &quot;topic: my-poc-topic-noexist&quot; -n 1000 -p postdata.json http://nlbdns/putmessage\n###\n</code></pre>\n<p>按照预想状况,这部分消息会以错误日志的形式保留在 Amazon EFS 中。</p>\n<pre><code class=\"lang-\">###\n$ ab -T 'application/json' -H &quot;topic: my-poc-topic-noexist&quot; -n 1000 -p postdata.json http://nlbdns/putmessage\n###\n</code></pre>\n<p>进入 EFS 中,打开带有 pod ip 的错误日志,可以看到错误信息被记录了下来。</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/b830d6f0a1c144dda316b1ac36f6c0a8_image.png\" alt=\"image.png\" /></p>\n<h4><a id=\"_543\"></a><strong>本篇作者</strong></h4>\n<p><img src=\"https://dev-media.amazoncloud.cn/85ab0851ac994575b7021ea2b5cad532_image.png\" alt=\"image.png\" /></p>\n<h4><a id=\"_549\"></a><strong>杨探</strong></h4>\n<p>亚马逊云科技解决方案架构师,负责互联网行业云端架构咨询和设计。</p>\n"}
目录
亚马逊云科技解决方案 基于行业客户应用场景及技术领域的解决方案
联系亚马逊云科技专家
亚马逊云科技解决方案
基于行业客户应用场景及技术领域的解决方案
联系专家
0
目录
关闭