Data Export¶
Warning
Data Export is an advanced feature currently in Public Preview, please file a support ticket if you wish to participate in the preview program. We are also rolling out to new regions regularly, so please check if your region is currently supported.
Overview¶
Observe provides industry leading retention, but there are scenarios where you may wish to move data in Observe to an AWS S3 bucket that is owned by your organization. You can use Observe’s Data Export feature to move data automatically from any Event Dataset to an S3 bucket via two distinct job types; duplicate data and post-retention data.
S3 Data Export is a good fit for the following scenarios:
You need to retain data for compliance purposes, but it does not need to be searchable.
You need to share subsets of Observe data to other teams.
Provide portability of your data post-ingestion.
When using Data Export, be aware of the following prerequisites:
The S3 bucket must be in the same region as your Observe tenant.
Exports will have an up to 2 hour delay from event arrival.
You can only export data from Event Datasets.
Hibernation cannot be applied to datasets that are associated with an S3 Export job.
Export Jobs¶
To create an export job, navigate to the Account Settings
page of your Observe tenant and select the Data export
option from the left-nav.
Observe supports two primary export job types. Export jobs support exporting in either NDJSON format (gzip compression) or Parquet format (snappy compression). All export job names must be unique. All export Destination
values must have a trailing /
character. There are two export job types:
Duplicate data¶
This export job copies data from the selected Dataset from the time of job creation, exporting new data as it arrives, to the designated S3 bucket.
Post-retention data¶
This export job copies data from the selected Dataset to the designated S3 bucket after the data has reached its retention limit.
Configuring your S3 Bucket for Data Export¶
To use the Data Export feature, you must ensure the following:
Your bucket must be in the same AWS region as your Observe tenant.
Your bucket policy must allow Observe to access your bucket.
Bucket Region¶
Your bucket must be in the same region as your Observe tenant. When creating a job, the Observe UI modal will display the region that your bucket must be located in.
To view the region your bucket is in, navigate to the bucket page in the Amazon S3 UI. Click on the Properties
tab. The bucket region is listed under AWS Region
in the Bucket overview
section, as shown in Figure 1.

Figure 1 - View bucket region
If your bucket is not in the supported region, we recommend creating a new bucket.
On the Create bucket page in the Amazon S3 UI, the region of the bucket can be configured by selecting from the drop-down menu at the top of the screen, as shown in Figure 2.

Figure 2 - Create bucket, choose region
Bucket Policy¶
For the Data Export feature to work, your S3 bucket policy must allow Observe to access your bucket and perform several basic actions. This can be done by adding statements to your bucket policy. The Observe UI should generate the statements that you need to add to your policy. If you wish to write these statements manually, see Statement templates.
To edit your bucket policy, navigate to the bucket page in the Amazon S3 UI. Click on the Permissions
tab. Under Bucket policy
, click Edit
.

Figure 3 - Edit bucket policy
In the Edit bucket policy
page, click Add new statement
and add each of the statements to your bucket policy. These statements should be generated for you in the Observe UI, or you can fill them out manually from Statement templates. When you are done, click Save changes
.

Figure 4 - “Add new statement” to your bucket policy
Statement templates¶
To allow Observe to write data to your S3 bucket, add the following three statements to the bucket policy. If you are using the Observe UI, these statements should be generated for you automatically with the templated values filled in.
Replace
<ROLE FROM OBSERVE UI>
with the role provided in the Observe UIReplace
<YOUR BUCKET NAME>
with the name of your bucketTo only grant access to a specific sub-path of your bucket, use
<PATH>
where indicated.
Statement 1¶
Allow Observe to get the location of your bucket.
{
"Effect": "Allow",
"Principal": {
"AWS": "<ROLE FROM OBSERVE UI>"
},
"Action": "s3:GetBucketLocation",
"Resource": "arn:aws:s3:::<YOUR BUCKET NAME>"
}
Statement 2¶
Allow Observe to list the objects within your bucket.
{
"Effect": "Allow",
"Principal": {
"AWS": "<ROLE FROM OBSERVE UI>"
},
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::<YOUR BUCKET NAME>",
"Condition": {
"StringLike": {
"s3:prefix": "[<PATH>/]*"
}
}
}
Statement 3¶
Allow Observe to write and delete objects from your bucket. Object deletion is only used in special error-handling cases to ensure no data is double-exported.
{
"Effect": "Allow",
"Principal": {
"AWS": "<ROLE FROM OBSERVE UI>"
},
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:GetObjectVersion",
"s3:DeleteObject",
"s3:DeleteObjectVersion"
],
"Resource": "arn:aws:s3:::<YOUR BUCKET NAME>/[<PATH>/]*"
}
Export Job Details¶
All export jobs have the following attributes:
State
: Active or Failed. You can hover over the Failed state to learn more about the failure reason.Job
/Description
: The name and description of the export job.Dataset
: The name of the dataset you are exporting.Destination
: The S3 bucket to which you are exporting your data.Older than
: This is the age at which the data will be exported, as determined from its arrival in the dataset.Earliest timestamp
: This is the timestamp of the the oldest event/data that has been exported.Latest export
: This is timestamp of the most recent event/data that has been exported.Created by
: The name of the user who created the export job, and the export job creation date.
Once a data export job is created, the resulting folder structure that Observe will create is:
/<observe_customerId>/<observe_jobId>/YYYY/MM/DD/HH_MI_SS/data_<observe_queryid>_0_0_0.ndjson.gz|snappy.parquet
Depending on the format you selected for the export job, it will either end in ndjzon.gz
(JSON) or snappy.parquet
(Parquet).
Export Job State¶
Export jobs can either be in an Active
or Failed
state. Hovering over the state pill will provide additional details as to why the export job has failed. If your job has failed due to your bucket being in the wrong region, or due to a misconfigured IAM policy, you can hover over the failure to retry the job. For example, if your export job fails due to misconfigured IAM policy, you can update the bucket policy and then retry.
Note that it can take up to 90 seconds for the export job state to reflect Failed
or Active
after creation or retry.

Figure 5 - Failed Export Job
Export Job Deletion¶
Export jobs can be deleted by hovering over the row of the job you want to delete, and clicking the trash-can icon on the right-hand side. You will be presented with a confirmation dialog. Note that deleting an export job does not delete data that has already been exported.

Figure 6 - Delete Export Job