Data Export

Warning

Data Export is an advanced feature currently in Public Preview, please file a support ticket if you wish to participate in the preview program. We are also rolling out to new regions regularly, so please check if your region is currently supported.

Overview

Observe provides industry leading retention, but there are scenarios where you may wish to move data in Observe to an AWS S3 bucket that is owned by your organization. You can use Observe’s Data Export feature to move data automatically from any Event Dataset to an S3 bucket via two distinct job types; duplicate data and post-retention data.

S3 Data Export is a good fit for the following scenarios:

  • You need to retain data for compliance purposes, but it does not need to be searchable.

  • You need to share subsets of Observe data to other teams.

  • Provide portability of your data post-ingestion.

When using Data Export, be aware of the following prerequisites:

  • The S3 bucket must be in the same region as your Observe tenant.

  • Exports will have an up to 2 hour delay from event arrival.

  • You can only export data from Event Datasets.

  • Hibernation cannot be applied to datasets that are associated with an S3 Export job.

Export Jobs

To create an export job, navigate to the Account Settings page of your Observe tenant and select the Data export option from the left-nav. Observe supports two primary export job types. Export jobs support exporting in either NDJSON format (gzip compression) or Parquet format (snappy compression). All export job names must be unique. All export Destination values must have a trailing / character. There are two export job types:

Duplicate data

This export job copies data from the selected Dataset from the time of job creation, exporting new data as it arrives, to the designated S3 bucket.

Post-retention data

This export job copies data from the selected Dataset to the designated S3 bucket after the data has reached its retention limit.

Configuring your S3 Bucket for Data Export

To use the Data Export feature, you must ensure the following:

  1. Your bucket must be in the same AWS region as your Observe tenant.

  2. Your bucket policy must allow Observe to access your bucket.

Bucket Region

Your bucket must be in the same region as your Observe tenant. When creating a job, the Observe UI modal will display the region that your bucket must be located in.

To view the region your bucket is in, navigate to the bucket page in the Amazon S3 UI. Click on the Properties tab. The bucket region is listed under AWS Region in the Bucket overview section, as shown in Figure 1.

View Bucket Region

Figure 1 - View bucket region

If your bucket is not in the supported region, we recommend creating a new bucket.

On the Create bucket page in the Amazon S3 UI, the region of the bucket can be configured by selecting from the drop-down menu at the top of the screen, as shown in Figure 2.

Create Bucket, Choose Region

Figure 2 - Create bucket, choose region

Bucket Policy

For the Data Export feature to work, your S3 bucket policy must allow Observe to access your bucket and perform several basic actions. This can be done by adding statements to your bucket policy. The Observe UI should generate the statements that you need to add to your policy. If you wish to write these statements manually, see Statement templates.

To edit your bucket policy, navigate to the bucket page in the Amazon S3 UI. Click on the Permissions tab. Under Bucket policy, click Edit.

Edit Bucket Policy

Figure 3 - Edit bucket policy

In the Edit bucket policy page, click Add new statement and add each of the statements to your bucket policy. These statements should be generated for you in the Observe UI, or you can fill them out manually from Statement templates. When you are done, click Save changes.

Add New Statement

Figure 4 - “Add new statement” to your bucket policy

Statement templates

To allow Observe to write data to your S3 bucket, add the following three statements to the bucket policy. If you are using the Observe UI, these statements should be generated for you automatically with the templated values filled in.

  • Replace <ROLE FROM OBSERVE UI> with the role provided in the Observe UI

  • Replace <YOUR BUCKET NAME> with the name of your bucket

  • To only grant access to a specific sub-path of your bucket, use <PATH> where indicated.

Statement 1

Allow Observe to get the location of your bucket.

{
	"Effect": "Allow",
	"Principal": {
		"AWS": "<ROLE FROM OBSERVE UI>"
	},
	"Action": "s3:GetBucketLocation",
	"Resource": "arn:aws:s3:::<YOUR BUCKET NAME>"
}

Statement 2

Allow Observe to list the objects within your bucket.

{
	"Effect": "Allow",
	"Principal": {
		"AWS": "<ROLE FROM OBSERVE UI>"
	},
	"Action": "s3:ListBucket",
	"Resource": "arn:aws:s3:::<YOUR BUCKET NAME>",
	"Condition": {
		"StringLike": {
			"s3:prefix": "[<PATH>/]*"
		}
	}
}

Statement 3

Allow Observe to write and delete objects from your bucket. Object deletion is only used in special error-handling cases to ensure no data is double-exported.

{
	"Effect": "Allow",
	"Principal": {
		"AWS": "<ROLE FROM OBSERVE UI>"
	},
	"Action": [
		"s3:PutObject",
		"s3:GetObject",
		"s3:GetObjectVersion",
		"s3:DeleteObject",
		"s3:DeleteObjectVersion"
	],
	"Resource": "arn:aws:s3:::<YOUR BUCKET NAME>/[<PATH>/]*"
}

Export Job Details

All export jobs have the following attributes:

  • State: Active or Failed. You can hover over the Failed state to learn more about the failure reason.

  • Job / Description: The name and description of the export job.

  • Dataset: The name of the dataset you are exporting.

  • Destination: The S3 bucket to which you are exporting your data.

  • Older than: This is the age at which the data will be exported, as determined from its arrival in the dataset.

  • Earliest timestamp: This is the timestamp of the the oldest event/data that has been exported.

  • Latest export: This is timestamp of the most recent event/data that has been exported.

  • Created by: The name of the user who created the export job, and the export job creation date.

Once a data export job is created, the resulting folder structure that Observe will create is:

/<observe_customerId>/<observe_jobId>/YYYY/MM/DD/HH_MI_SS/data_<observe_queryid>_0_0_0.ndjson.gz|snappy.parquet

Depending on the format you selected for the export job, it will either end in ndjzon.gz (JSON) or snappy.parquet (Parquet).

Export Job State

Export jobs can either be in an Active or Failed state. Hovering over the state pill will provide additional details as to why the export job has failed. If your job has failed due to your bucket being in the wrong region, or due to a misconfigured IAM policy, you can hover over the failure to retry the job. For example, if your export job fails due to misconfigured IAM policy, you can update the bucket policy and then retry. Note that it can take up to 90 seconds for the export job state to reflect Failed or Active after creation or retry.

Export Job Creation

Figure 5 - Failed Export Job

Export Job Deletion

Export jobs can be deleted by hovering over the row of the job you want to delete, and clicking the trash-can icon on the right-hand side. You will be presented with a confirmation dialog. Note that deleting an export job does not delete data that has already been exported.

Export Job Delete

Figure 6 - Delete Export Job