使用 SkyWalking 监控 Amazon EKS 和 S3

API
Amazon Simple Storage Service (S3)
Amazon Kinesis
Amazon Elastic Kubernetes Service (EKS)
0
1
![6ca22d19cf3a0330d016b674631aa7b.jpg](https://dev-media.amazoncloud.cn/58f515e0184e47639ff0c85d4b2fb219_6ca22d19cf3a0330d016b674631aa7b.jpg "6ca22d19cf3a0330d016b674631aa7b.jpg") SKyWalking OAP 现有的 [OpenTelemetry receiver](https://skywalking.apache.org/docs/main/next/en/setup/backend/opentelemetry-receiver/?trk=cndc-detai) 可以通过 [OTLP](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/otlp.md?trk=cndc-detail) 协议接收指标(metrics),并且使用 [MAL](https://skywalking.apache.org/docs/main/next/en/concepts-and-designs/mal/?trk=cndc-detail) 实时分析相关指标。从 OAP 9.4.0 开始,SkyWalking 新增了 [Amazon Firehose receiver](https://skywalking.apache.org/docs/main/next/en/setup/backend/aws-firehose-receiver/?trk=cndc-detail),用来接收,分析 CloudWatch metrics 数据。本文将以 EKS 和 S3 为例介绍 SkyWalking OAP 接收,分析亚马逊云科技服务的指标数据的过程。 ### EKS #### OpenTelemetry Collector [OpenTelemetry (OTel)](https://opentelemetry.io/?trk=cndc-detail) 是一系列 tools,API,SDK,可以生成,收集,导出遥测数据,比如 指标(metrics),日志(logs)和链路信息(traces),而 OTel Collector 主要负责收集、处理和导出遥测数据,Collector 由以下主要组件组成: 1. receiver: 负责获取遥测数据,不同的 receiver 支持不同的数据源,比如 prometheus,kafka,otlp, 2. processor:在 receiver 和 exporter 之间处理数据,比如增加或者删除attributes, 3. exporter:负责发送数据到不同的后端,比如 kafka,SkyWalking OAP(通过 OTLP) 4. service: 作为一个单元配置启用的组件,只有配置的组件才会被启用 ###### OpenTelemetry Protocol Specification(OTLP) [OTLP](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/otlp.md?trk=cndc-detail) 主要描述了如何通过gRPC,HTTP协议接收(拉取)指标数据。SKyWalking OAP的 [OpenTelemetry receiver](https://skywalking.apache.org/docs/main/next/en/setup/backend/opentelemetry-receiver/?trk=cndc-detail) 实现了 OTLP/gRPC 协议,通过 OTLP/gRPC exporter 可以将指标数据导出到 OAP。通常一个 Collector 的数据流向如下: ![0.png](https://dev-media.amazoncloud.cn/6f273515d197412da239e3bc45ad1101_0.png "0.png") #### 使用 OTel 监控 EKS EKS 的监控就是通过 OTel 实现的,只需在 EKS 集群中以` DaemonSet `的方式部署 OpenTelemetry Collector,使用 [Amazon Container Insights Receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/awscontainerinsightreceiver/README.md?trk=cndc-detail) 作为 receiver,并且设置 otlp exporter 的地址为 OAP 的的地址即可。另外需要注意的是 OAP 根据 attribute `job_name : aws-cloud-eks-monitoring` 作为 EKS metrics 的标识,所以还需要再 collector 中配置一个 processor 来增加这个属性 ###### OTel Collector 配置 demo ```yaml extensions: health_check: receivers: awscontainerinsightreceiver: processors: # 为了OAP能够正确识别EKS metrics,增加job_name attribute resource/job-name: attributes: - key: job_name value: aws-cloud-eks-monitoring action: insert # 指定OAP作为 exporters exporters: otlp: endpoint: oap-service:11800 tls: insecure: true logging: loglevel: debug service: pipelines: metrics: receivers: [awscontainerinsightreceiver] processors: [resource/job-name] exporters: [otlp,logging] extensions: [health_check] ``` SkyWalking OAP 默认统计 Node,Pod,Service 三个维度的网络、磁盘、CPU等相关的指标数据,这里仅展示了部分内容 ###### Pod 维度 ![1.png](https://dev-media.amazoncloud.cn/76918020f33e4a1a825a931cb08a92ab_1.png "1.png") ###### Service 维度 ![2.png](https://dev-media.amazoncloud.cn/18575a9cfd0f44a0bda7d7a8b21139aa_2.png "2.png") ###### EKS 监控完整配置 - Click here to view complete k8s resource configuration ```yaml apiVersion: v1 kind: ServiceAccount metadata: name: aws-otel-sa namespace: aws-otel-eks --- kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: aoc-agent-role rules: - apiGroups: [""] resources: ["pods", "nodes", "endpoints"] verbs: ["list", "watch"] - apiGroups: ["apps"] resources: ["replicasets"] verbs: ["list", "watch"] - apiGroups: ["batch"] resources: ["jobs"] verbs: ["list", "watch"] - apiGroups: [""] resources: ["nodes/proxy"] verbs: ["get"] - apiGroups: [""] resources: ["nodes/stats", "configmaps", "events"] verbs: ["create", "get"] - apiGroups: [""] resources: ["configmaps"] resourceNames: ["otel-container-insight-clusterleader"] verbs: ["get","update"] - apiGroups: ["coordination.k8s.io"] resources: ["leases"] verbs: ["create","get","update"] --- kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: aoc-agent-role-binding subjects: - kind: ServiceAccount name: aws-otel-sa namespace: aws-otel-eks roleRef: kind: ClusterRole name: aoc-agent-role apiGroup: rbac.authorization.k8s.io --- apiVersion: v1 kind: ConfigMap metadata: name: otel-agent-conf namespace: aws-otel-eks labels: app: opentelemetry component: otel-agent-conf data: otel-agent-config: | extensions: health_check: receivers: awscontainerinsightreceiver: processors: resource/job-name: attributes: - key: job_name value: aws-cloud-eks-monitoring action: insert exporters: otlp: endpoint: oap-service:11800 tls: insecure: true logging: loglevel: debug service: pipelines: metrics: receivers: [awscontainerinsightreceiver] processors: [resource/job-name] exporters: [otlp,logging] extensions: [health_check] --- apiVersion: apps/v1 kind: DaemonSet metadata: name: aws-otel-eks-ci namespace: aws-otel-eks spec: selector: matchLabels: name: aws-otel-eks-ci template: metadata: labels: name: aws-otel-eks-ci spec: containers: - name: aws-otel-collector image: amazon/aws-otel-collector:v0.23.0 env: # Specify region - name: AWS_REGION value: "ap-northeast-1" - name: K8S_NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName - name: HOST_IP valueFrom: fieldRef: fieldPath: status.hostIP - name: HOST_NAME valueFrom: fieldRef: fieldPath: spec.nodeName - name: K8S_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace imagePullPolicy: Always command: - "/awscollector" - "--config=/conf/otel-agent-config.yaml" volumeMounts: - name: rootfs mountPath: /rootfs readOnly: true - name: dockersock mountPath: /var/run/docker.sock readOnly: true - name: varlibdocker mountPath: /var/lib/docker readOnly: true - name: containerdsock mountPath: /run/containerd/containerd.sock readOnly: true - name: sys mountPath: /sys readOnly: true - name: devdisk mountPath: /dev/disk readOnly: true - name: otel-agent-config-vol mountPath: /conf - name: otel-output-vol mountPath: /otel-output resources: limits: cpu: 200m memory: 200Mi requests: cpu: 200m memory: 200Mi volumes: - configMap: name: otel-agent-conf items: - key: otel-agent-config path: otel-agent-config.yaml name: otel-agent-config-vol - name: rootfs hostPath: path: / - name: dockersock hostPath: path: /var/run/docker.sock - name: varlibdocker hostPath: path: /var/lib/docker - name: containerdsock hostPath: path: /run/containerd/containerd.sock - name: sys hostPath: path: /sys - name: devdisk hostPath: path: /dev/disk/ - name: otel-output-vol hostPath: path: /otel-output serviceAccountName: aws-otel-sa ``` ### S3 #### Amazon CloudWatch [Amazon CloudWatch](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/WhatIsCloudWatch.html?trk=cndc-detail) 是亚马逊云科技提供的监控服务,负责收集亚马逊云科技服务,资源的指标数据,CloudWatch metrics stream 负责将指标数据转换为流式处理数据,支持输出 json,OTel v0.7.0 两种格式。 #### Amazon Kinesis Data Firehose (Firehose) [Firehose](https://aws.amazon.com/cn/kinesis/data-firehose/?trk=cndc-detail) 是一项提取、转换、加载(ETL)服务,可以将流式处理数据以可靠方式捕获、转换和提供到数据湖、数据存储(比如 S3)和分析服务中。 为了确保外部服务能够正确地接收指标数据, 亚马逊云科技提供了 [Kinesis Data Firehose HTTP Endpoint Delivery Request and Response Specifications (Firehose Specifications)](https://docs.aws.amazon.com/firehose/latest/dev/httpdeliveryrequestresponse.html?trk=cndc-detail)。Firhose 以 POST 的方式推送 Json 数据 ###### Json 数据示例 ```json { "requestId": "ed4acda5-034f-9f42-bba1-f29aea6d7d8f", "timestamp": 1578090901599 "records": [ { "data": "aGVsbG8=" }, { "data": "aGVsbG8gd29ybGQ=" } ] } ``` 1. requestId: 请求 id,可以实现去重,debug 目的 2. timestamp: Firehose 产生该请求的时间戳(毫秒) 3. records: 实际投递的记录 1. data: 投递的数据,以base64编码数据,可以是 json 或者 OTel v0.7.0 格式,取决于 CloudWatch 数据数据的格式(稍后会有描述)。Skywalking 目前支持 OTel v0.7.0 格式 ##### aws-firehose-receiver `aws-firehose-receiver` 就是提供了一个实现了 Firehose Specifications 的 HTTP Endpoint:`/aws/firehose/metrics`。下图展示了通过 CloudWatch 监控 DynamoDB,S3 等服务,并利用 Firehose 将指标数据发送到 SKywalking OAP 的数据流向。 ![3.png](https://dev-media.amazoncloud.cn/a9ba19624c20447cb38d3c17fc0f002a_3.png "3.png") 从上图可以看到 `aws-firehose-receiver` 将数据转换后交由 `OpenTelemetry-receiver`处理 ,所以 [OpenTelemetry receiver](https://skywalking.apache.org/docs/main/next/en/setup/backend/opentelemetry-receiver/?trk=cndc-detail) 中配置的 `otel-rules` 同样可以适用 CloudWatch metrics ###### 注意 * 因为 Kinesis Data Firehose 要求,必须在 Amazon Firehose receiver 前放置一个 Gateway 用来建立 HTTPS 链接。`aws-firehose-receiver` 将从 v9.5.0 开始支持 HTTPS 协议 * TLS 证书必须是 CA 签发的 #### 逐步设置 S3 监控 1. 进入 S3 控制台,通过 `Amazon S3 >> Buckets >> (Your Bucket) >> Metrics >> metrics >> View additional charts >> Request metrics` 为 `Request metrics` 创建 filter ![4.png](https://dev-media.amazoncloud.cn/319ec5cddeb845b0b37092b7f61bc3ff_4.png "4.png") 2. 进入 Amazon Kinesis 控制台,创建一个 delivery stream, `Source`选择 `Direct PUT`, `Destination` 选择 `HTTP Endpoint`. 并且设置`HTTP endpoint URL` 为 `https://your_domain/aws/firehose/metrics`。其他配置项: * `Buffer hints`: 设置缓存的大小和周期 * `Access key` 与 aws-firehose-receiver 中的 AccessKey 一致即可 * `Retry duration`: 重试周期 * `Backup settings`: 备份设置,可选地将投递的数据同时备份到 S3。 ![5.png](https://dev-media.amazoncloud.cn/d358697334054d679eff0d4376bfa14d_5.png "5.png") 3. 进入 CloudWatch 控制台,`Streams` 标签创建 CloudWatch Stream。并且在`Select your Kinesis Data Firehose stream`项中配置第二步创建的 delivery stream。注意需要设置`Change output format` 为 `OpenTelemetry v0.7.0`。 ![6.png](https://dev-media.amazoncloud.cn/104b3a245ab84076a402a176a5a95da5_6.png "6.png") 至此,S3 监控配置设置完成。目前 SkyWalking 默认收集的 S3 metrics 展示如下 ![7.png](https://dev-media.amazoncloud.cn/82ac1066c3934aa4959d801c734cd5ee_7.png "7.png") ### 其他服务 目前 SkyWalking 官方支持 EKS,S3,DynamoDB 监控。 用户也参考 [OpenTelemetry receiver](https://skywalking.apache.org/docs/main/next/en/setup/backend/opentelemetry-receiver/?trk=cndc-detail) 配置 OTel rules 来收集,分析亚马逊云科技其他服务的 CloudWatch metrics,并且通过[自定义 dashboard](https://skywalking.apache.org/docs/main/next/en/ui/readme/?trk=cndc-detail") 展示 ### 资料 * [Monitoring S3 metrics with Amazon CloudWatch](https://docs.aws.amazon.com/AmazonS3/latest/userguide/cloudwatch-monitoring.html?trk=cndc-detail) * [Monitoring DynamoDB metrics with Amazon CloudWatch](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/monitoring-cloudwatch.html?trk=cndc-detail) * [Supported metrics in Amazon Firehose receiver of OAP](https://skywalking.apache.org/docs/main/next/en/setup/backend/aws-firehose-receiver/?trk=cndc-detail) * [Configuration Vocabulary | Apache SkyWalking](https://skywalking.apache.org/docs/main/next/en/setup/backend/configuration-vocabulary/?trk=cndc-detail) 文章审核:亚马逊云科技 Container Hero **吴晟**
1
目录
关闭
contact-us