ElasticSearch Import/Export
There are several ways to import/export indexes from ElasticSearch but over the years some OpenAF scripts were made for the same that were captured in two ojob.io ojobs:
- ojob.io/es/export
- ojob.io/es/import
As the name implies one should be used to export data from ElasticSearch and the other to import data. The data is stored in ndjson to make it easier to “transport it” between different ElasticSearch environments and versions (with some limitations depending on the version).
To move data within the same ElasticSearch cluster you should use other available features from ElasticSearch like reindex.
Exporting data
The ojob.io/es/export ojob currently has the following possible options (you can check them also be executing ojob ojob.io/es/export -jobhelp
):
Option | Mandatory | Description |
---|---|---|
url | yes | The ElasticSearch/OpenSearch instance URL |
output | no | The output folder where the ndjson files will be created |
user | no | The user credential, if necessary, to access the cluster |
pass | no | The password credential, if necessary, to access the cluster |
idx | no | A regular expression filter for the indices to export (otherwise all will be exported) |
force | no | If force=true it will not check if there is a previous ndjson file for each index |
filter | no | Comma separated pairs of field value filters (for example: “field1:abc,field2:xyz”) |
notfilter | no | Comma separated pairs of field value not filters (for example: “field1:abc,field2:xyz”) |
from | no | Date greater-than or equal (for example: 2022-03-04T01:02:03) |
to | no | Date lower-than or equal (for example: 2022-03-02T02:03:04) |
Everytime that it runs (without the force option) ojob.io/es/export will check if there is previous data for a specific index. In scenarios that, for example, you have a new daily index and you run ojob.io/es/export also daily it will not try to export again data from previous indexes found on the same folder.
Note: output files will be automatically gziped to save space.
Examples:
Export data to folder ‘out’ from all indices that start with ‘my-data’:
ojob ojob.io/es/export url=https://my.elastic.search.cluster:9200 output=out idx=my-data*
Export data to folder ‘out’ from all indices that start with ‘my-data’ where the corresponding documents have a field ‘type’ with the value ‘audit’:
ojob ojob.io/es/export url=https://my.elastic.search.cluster:9200 output=out idx=my-data* filter=type:audit
Export data to folder ‘out’ from all indices that start with ‘my-data’ where the corresponding documents have a field ‘@timestamp’ between 1st of October 2022 and the end of the 2nd of October 2022:
ojob ojob.io/es/export url=https://my.elastic.search.cluster:9200 output=out idx=my-data* from=2022-10-01T00:00:00 to=2022-10-02T23:59:59
Importing data
After exporting data you might need to import it. To do this you can use ojob.io/es/import ojob:
ojob ojob.io/es/import url=http://my.elastic2.search.cluster:9200 file=out/my-data-22.ndjson.gz index=their-data-22
The ojob.io/es/import ojob currently has the following possible options (you can check them also be executing ojob ojob.io/es/import -jobhelp
)
Option | Mandatory | Description |
---|---|---|
url | yes | The ElasticSearch/OpenSearch instance URL |
user | no | The user credential, if necessary, to access the cluster |
pass | no | The password credential, if necessary, to access the cluster |
index | yes | The index to which data will be imported to |
file | yes | The ndjson (.gz or not) with the data to import (usually result of ojob.io/es/export) |
Using them in an air gap environment
To use these in an air gap enviroment (without Internet) you will need:
- the latest OpenAF version
- the ojob.io/es/export yaml file (
ojob ojob.io/get job=ojob.io/es/export airgap=true
) - the ojob.io/es/import yaml file (
ojob ojob.io/get job=ojob.io/es/import airgap=true
)