> **本文作者 韩旭**
>
> 北京西云数据科技有限公司
>
> 亚马逊云技术支持亚马逊云科技云博主
在当今的数据驱动时代,搜索引擎的快照功能在数据保护和灾难恢复中至关重要。
**本文将对 Easysearch、Elasticsearch 和 Amazon OpenSearch 的快照兼容性进行比较**,分析它们在快照创建、恢复、存储格式和跨平台兼容性等方面的特点,帮助大家更好地理解这些搜索引擎的差异,从而选择最适合自己需求的解决方案。
### **启动集群**
#### **Easysearch**
服务器一般情况下默认参数都是很低的,而 Easysearch 和 Elasticsearch 是内存大户,所以就需要进行系统调优。
```js
sysctl -w vm.max_map_count=262144
```
`vm.max_map_count` 是一个 Linux 内核参数,用于控制单个进程可以拥有的最大内存映射区域(VMA,Virtual Memory Areas)的数量。内存映射区域是指通过内存映射文件或匿名内存映射创建的虚拟内存区域。
这个参数在一些应用程序中非常重要,尤其是那些需要大量内存映射的应用程序,比如 Elasticsearch。Elasticsearch 使用内存映射文件来索引和搜索数据,这可能需要大量的内存映射区域。如果 `vm.max_map_count` 设置得太低,Elasticsearch 可能无法正常工作,并会出现错误信息。
调整 `vm.max_map_count` 参数的一些常见原因:
**1.支持大型数据集:**
应用程序(如 Elasticsearch)在处理大型数据集时可能需要大量内存映射区域。增加 `vm.max_map_count` 可以确保这些应用程序有足够的内存映射区域来处理数据。
**2.防止内存错误:**
如果 `vm.max_map_count` 设置得太低,当应用程序尝试创建超过限制的内存映射时,会出现错误,导致应用程序崩溃或无法正常工作。
**3.优化性能:**
适当地设置 `vm.max_map_count` 可以优化应用程序的性能,确保内存映射操作顺利进行。
检查当前的` vm.max_map_count` 值:
```js
sysctl vm.max_map_count
```
或者查看 `/proc/sys/vm/max_map_count` 文件:
```js
cat /proc/sys/vm/max_map_count
```
Elasticsearch 官方建议将 `vm.max_map_count` 设置为至少262144。对于其他应用程序。
Easysearch 具体安装步骤见
- [INFINI Easysearch 尝鲜 Hands on](https://blog.csdn.net/weixin_38781498/article/details/140077785?trk=cndc-detail)
#### **Amazon OpenSearch**
使用亚马逊云科技控制台进行创建。
#### **Elasticsearch**
使用如下 Docker Compose 部署一个三节点的 ES 集群:
```js
version: '2.2'
services:
es01:
image: docker.elastic.co/elasticsearch/elasticsearch:7.10.2
container_name: es01
environment:
- node.name=es01
- cluster.name=es-docker-cluster
- discovery.seed_hosts=es02,es03
- cluster.initial_master_nodes=es01,es02,es03
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- data01:/usr/share/elasticsearch/data
ports:
- 9200:9200
networks:
- elastic
es02:
image: docker.elastic.co/elasticsearch/elasticsearch:7.10.2
container_name: es02
environment:
- node.name=es02
- cluster.name=es-docker-cluster
- discovery.seed_hosts=es01,es03
- cluster.initial_master_nodes=es01,es02,es03
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- data02:/usr/share/elasticsearch/data
networks:
- elastic
es03:
image: docker.elastic.co/elasticsearch/elasticsearch:7.10.2
container_name: es03
environment:
- node.name=es03
- cluster.name=es-docker-cluster
- discovery.seed_hosts=es01,es02
- cluster.initial_master_nodes=es01,es02,es03
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- data03:/usr/share/elasticsearch/data
networks:
- elastic
volumes:
data01:
driver: local
data02:
driver: local
data03:
driver: local
networks:
elastic:
driver: bridge
```
由于这个 Docker Compose 没有关于 Kibana 的配置,所以我们还是用 Console 添加原生的 Elasticsearch 集群
集群信息
![image.png](https://dev-media.amazoncloud.cn/764ddba200e3486da218eaf9a1c69808_image.png "image.png")
### **快照还原的步骤**
#### **快照前的准备**
#### **插件安装**
本次测试选择把索引快照备份到 [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail),所以需要使用 [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail) repository plugin,这个插件添加了对使用 [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail) 作为快照或恢复存储库的支持。
Easysearch 和 OpenSearch 集群自带了这个插件,所以无需额外安装。
对于自己部署的三节点 Elasticsearch 则需要进入每一个节点运行安装命令然后再重启集群,建议使用自动化运维工具来做这步,安装命令如下:
```js
sudo bin/elasticsearch-plugin install repository-s3
```
如果不再需要这个插件,可以这样删除。
```js
sudo bin/elasticsearch-plugin remove repository-s3
```
由于需要和亚马逊云科技打交道,所以我们需要设置 Amazon IAM 凭证,这个插件可以从 [Amazon EC2 ](https://aws.amazon.com/cn/ec2/?trk=cndc-detail)IAM instance profile,[Amazon ECS](https://aws.amazon.com/cn/ecs/?trk=cndc-detail) task role 以及 [Amazon EKS](https://aws.amazon.com/cn/eks/?trk=cndc-detail) 的 Service account 读取相应的凭证。
对于托管的 Amazon OpenSearch 来说,我们无法在托管的 [Amazon EC2 ](https://aws.amazon.com/cn/ec2/?trk=cndc-detail)上绑定我们的凭证,所以需要新建一个 OpenSearchSnapshotRole,然后通过当前的用户把这个角色传递给服务,也就是我们说的 IAM:PassRole。
创建 OpenSearchSnapshotRole,策略如下:
```js
{
"Version": "2012-10-17",
"Statement": [{
"Action": [
"s3:ListBucket"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::bucket-name"
]
},
{
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::bucket-name/*"
]
}
]
}
```
信任关系如下:
```js
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": {
"Service": "es.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
```
然后在我们的 Amazon IAM user 上加上 PassRole 的权限,这样我们就可以把 OpenSearchSnapshotRole 传递给 OpenSearch 集群。
```js
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": "arn:aws:iam::123456789012:role/OpenSearchSnapshotRole"
}
]
}
```
#### **注册存储库**
在源集群执行注册
```js
PUT /_snapshot/snapshot-repo-name
{
"type": "s3",
"settings": {
"bucket": "<bucket-name>",
"base_path": "<bucket-prefix>",
}
}
```
在目标集群同样执行这个语句,为了防止覆盖源集群存储库的数据,将“readonly”: true 添加到“settings” PUT 请求中,这样就只有一个集群具有对存储库的写入权限。
如果Bucket在中国区,那么还需要加上 **endpoint: https://s3.< region >.amazonaws.com.cn** 这样的参数。
```js
PUT /_snapshot/snapshot-repo-name
{
"type": "s3",
"settings": {
"bucket": "<bucket-name>",
"base_path": "<bucket-prefix>",
"readonly": true,
}
}
```
对于 OpenSearch 来说,还需要执行 passrole,所以还需要添加 role_arn 这个字段,由于 IAM:PassRole 需要对 HTTP 请求做 signV4 日签名,所以这部常常使用 Postman 来完成。把角色传递过去之后,接下来的快照还原操作就可以在 OpenSearch Dashboard 中进行操作了。
![image.png](https://dev-media.amazoncloud.cn/d275a6167322456a9db1ef08aa06474b_image.png "image.png")
需要注意的是,需要在 auth 这里输入 AccessKey,SecretKey,Amazon Region,Service Name(es)来做 SignV4 的签名。
![image.png](https://dev-media.amazoncloud.cn/5f27beeb4b95458d93b702c0bd5f2e0d_image.png "image.png")
请求体如下:
```js
{
"type": "s3",
"settings": {
"bucket": "<bucket-name>",
"base_path": "<bucket-prefix>",
"readonly": true,
"role_arn": "arn:aws:iam::123456789012:role/OpenSearchSnapshotRole"
}
}
```
- **查看所有注册的存储库:**
`GET _snapshot`:这个命令返回所有已注册的快照存储库列表及其基本信息。
```js
GET _snapshot
```
```js
{
"es_repository": {
"type": "s3",
"settings": {
"bucket": "your-s3-bucket-name",
"region": "your-s3-bucket-region"
}
}
}
```
- **查看特定存储库中的快照:**
`GET _cat/snapshots/es_repository?v`:这个命令返回 `es_repository` 存储库中的所有快照及其详细信息,包括快照 ID、状态、开始时间、结束时间、持续时间、包含的索引数量、成功和失败的分片数量等。
```js
GET _cat/snapshots/es_repository?v
```
```js
id status start_epoch start_time end_epoch end_time duration indices successful_shards failed_shards total_shards
snapshot_1 SUCCESS 1628884800 08:00:00 1628888400 09:00:00 1h 3 10 0 10
snapshot_2 SUCCESS 1628971200 08:00:00 1628974800 09:00:00 1h 3 10 0 10
```
#### **创建索引快照**
```js
# PUT _snapshot/my_repository/<my_snapshot_{now/d}>
PUT _snapshot/my_repository/my_snapshot
{
"indices": "my-index,logs-my_app-default",
}
```
根据快照的大小不同,完成快照可能需要一些时间。默认情况下,create snapshot API 只会异步启动快照过程,该过程在后台运行。要更改为同步调用,可以将 `wait_for_completion` 查询参数设置为 `true`。
```js
PUT _snapshot/my_repository/my_snapshot?wait_for_completion=true
```
另外还可以使用 clone snapshot API 克隆现有的快照。要监控当前正在运行的快照,可以使用带有 `_current` 请求路径参数的 get snapshot API。
```js
GET _snapshot/my_repository/_current
```
如果要获取参与当前运行快照的每个分片的完整详细信息,可以使用 get snapshot status API。
```js
GET _snapshot/_status
```
成功创建快照之后,就可以在 [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail) 上看到备份的数据块文件,这个是正确的快照层级结构:
![image.png](https://dev-media.amazoncloud.cn/7bdbf0276efe42be8156fc8e18ab7b60_image.png "image.png")
需要注意的是, “base_path”: “”这里最好不要加/,虽然不影响同集群迁移,这个会为我们在不同厂商的搜索引擎中迁移遇到问题,可能是这样的,所以需要注意。
![image.png](https://dev-media.amazoncloud.cn/963c258b1fa9406c80510c89e4f307f1_image.png "image.png")
所以在 Open Search 中还原 Elasticsearch 就遇到了这个问题:
```js
{
"error": {
"root_cause": [
{
"type": "snapshot_missing_exception",
"reason": "[easy_repository:2/-jOQ0oucQDGF3hJMNz-vKQ] is missing"
}
],
"type": "snapshot_missing_exception",
"reason": "[easy_repository:2/-jOQ0oucQDGF3hJMNz-vKQ] is missing",
"caused_by": {
"type": "no_such_file_exception",
"reason": "Blob object [11111/indices/7fv2zAi4Rt203JfsczUrBg/meta-YGnzxZABRBxW-2vqcmci.dat] not found: The specified key does not exist. (Service: S3, Status Code: 404, Request ID: R71DDHX4XXM0434T, Extended Request ID: d9M/HWvPvMFdPhB6KX+wYCW3ZFqeFo9EoscWPkulOXWa+TnovAE5PlemtuVzKXjlC+rrgskXAus=)"
}
},
"status": 404
}
```
恢复索引快照
```js
POST _snapshot/my_repository/my_snapshot_2099.05.06/_restore
{
"indices": "my-index,logs-my_app-default",
}
```
### **各个集群的还原**
1.Elasticsearch 7.10.2 的快照可以还原到 Easysearch 和 Amazon OpenSearch
2.从Easysearch 1.8.2 还原到 Elasticsearch 7.10.2 报错如下:
```js
{
"error": {
"root_cause": [
{
"type": "snapshot_restore_exception",
"reason": "[s3_repository:1/a2qV4NYIReqvgW6BX_nxxw] cannot restore index [my_indexs] because it cannot be upgraded"
}
],
"type": "snapshot_restore_exception",
"reason": "[s3_repository:1/a2qV4NYIReqvgW6BX_nxxw] cannot restore index [my_indexs] because it cannot be upgraded",
"caused_by": {
"type": "illegal_state_exception",
"reason": "The index [[my_indexs/ALlTCIr0RJqtP06ouQmf0g]] was created with version [1.8.2] but the minimum compatible version is [6.0.0-beta1]. It should be re-indexed in Elasticsearch 6.x before upgrading to 7.10.2."
}
},
"status": 500
}
```
3.从 Amazon OpenSearch 2.1.3 还原到 Elasticsearch 7.10.2 报错如下(无论是否开启兼容模式):
```js
{
"error": {
"root_cause": [
{
"type": "snapshot_restore_exception",
"reason": "[aos:2/D-oyYSscSdCbZFcmPZa_yg] the snapshot was created with Elasticsearch version [36.34.78-beta2] which is higher than the version of this node [7.10.2]"
}
],
"type": "snapshot_restore_exception",
"reason": "[aos:2/D-oyYSscSdCbZFcmPZa_yg] the snapshot was created with Elasticsearch version [36.34.78-beta2] which is higher than the version of this node [7.10.2]"
},
"status": 500
}
```
4.从 Easysearch 1.8.2 还原到 Amazon OpenSearch2.13 报错如下(无论是否开启兼容模式):
```js
{
"error": {
"root_cause": [
{
"type": "snapshot_restore_exception",
"reason": "[easy_repository:2/LE18AWHlRJu9rpz9BJatUQ] cannot restore index [my_indexs] because it cannot be upgraded"
}
],
"type": "snapshot_restore_exception",
"reason": "[easy_repository:2/LE18AWHlRJu9rpz9BJatUQ] cannot restore index [my_indexs] because it cannot be upgraded",
"caused_by": {
"type": "illegal_state_exception",
"reason": "The index [[my_indexs/VHOo7yfDTRa48uhQvquFzQ]] was created with version [1.8.2] but the minimum compatible version is OpenSearch 1.0.0 (or Elasticsearch 7.0.0). It should be re-indexed in OpenSearch 1.x (or Elasticsearch 7.x) before upgrading to 2.13.0."
}
},
"status": 500
}
```
5.Amazon OpenSearch 还原到 Easysearch 同样失败
```js
{
"error": {
"root_cause": [
{
"type": "snapshot_restore_exception",
"reason": "[aoss:2/D-oyYSscSdCbZFcmPZa_yg] cannot restore index [aos] because it cannot be upgraded"
}
],
"type": "snapshot_restore_exception",
"reason": "[aoss:2/D-oyYSscSdCbZFcmPZa_yg] cannot restore index [aos] because it cannot be upgraded",
"caused_by": {
"type": "illegal_state_exception",
"reason": "The index [[aos/864WjTAXQCaxJ829V5ktaw]] was created with version [36.34.78-beta2] but the minimum compatible version is [6.0.0]. It should be re-indexed in Easysearch 6.x before upgrading to 1.8.2."
}
},
"status": 500
}
```
6.Elasticsearch 8.14.3 迁移到 Amazon OpenSearch 或者 Elasticsearch 都是有这个报错:
```js
{
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "Failed to parse object: unknown field [uuid] found",
"line": 1,
"col": 25
}
],
"type": "repository_exception",
"reason": "[snap] Unexpected exception when loading repository data",
"caused_by": {
"type": "parsing_exception",
"reason": "Failed to parse object: unknown field [uuid] found",
"line": 1,
"col": 25
}
},
"status": 500
}
```
这是由于 Elasticsearch8 在创建快照的时候会默认加上一个 UUID 的字段,所以我们低版本的 Easysearch、Amazon OpenSearch 中会找不到这个字段,在执行 `GET _cat/snapshots/snap?v` 的时候就报错,及时在注册存储库的时候显示加上 UUID 的字段也无事无补。
```js
{
"snapshot-repo-name": {
"type": "s3",
"uuid": "qlJ0uqErRmW6aww2Fyt4Fg",
"settings": {
"bucket": "<bucket-name>",
"base_path": "<bucket-prefix>",
}
}
```
以下是兼容性对比,每行第一列代表源集群,第一行代表目标集群:
![image.png](https://dev-media.amazoncloud.cn/20f4f74bd5f64d47a982f7537dec0dd4_image.png "image.png")
Elasticsearch 的兼容列表官方的列表如下:
![image.png](https://dev-media.amazoncloud.cn/11ba0cd750d9410f8d4cb693e86e4a05_image.png "image.png")
**参考文献**
- [开始使用Elastic Stack和Docker Compose:第 1 部分](https://www.elastic.co/cn/blog/getting-started-with-the-elastic-stack-and-docker-compose?trk=cndc-detail)
- [Docker Compose部署多节点Elasticsearch](https://www.elastic.co/guide/en/elasticsearch/reference/7.10/docker.html#docker-compose-file?trk=cndc-detail)
- repository-s3教程
https://www.elastic.co/guide/en/elasticsearch/reference/8.14/repository-s3.html?trk=cndc-detail
https://www.elastic.co/guide/en/elasticsearch/plugins/7.10/repository-s3.html?trk=cndc-detail
- [snapshot-restore](https://www.elastic.co/guide/en/elasticsearch/reference/7.10/snapshot-restore.html?trk=cndc-detail)
- [在Amazon OpenSearch服务中创建索引快照](https://docs.amazonaws.cn/zh_cn/opensearch-service/latest/developerguide/managedomains-snapshots.html#managedomains-snapshot-restore?trk=cndc-detail)
- [教程:迁移至Amazon OpenSearch Service](https://docs.amazonaws.cn/zh_cn/opensearch-service/latest/developerguide/migration.html?trk=cndc-detail)
![image.png](https://dev-media.amazoncloud.cn/0415c94e8da44d29a19d9ae07655bc79_image.png "image.png")