Amazon S3

Two types of data may be ingested from or about S3 buckets:

Ingesting data using Filedrop

If you have data in Amazon S3 buckets, you can ingest the data into Observe using Filedrop. Observe ingests files in either JSON or Apache Parquet format. Apache Parquet provides an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk.

Note

Filedrop supports Apache Parquet, JSON, CSV, and TEXT. Files larger than 1GB are not supported.

For JSON format, Filedrop supports a single object, an array of objects, or newline delimited objects.

For CSV format, the first row in the file must be the header.

For CSV format, you can specify a delimiter by setting “–content-type application/x-csv;delimiter=delim”.

For TEXT format, use “–content-type text/plain”.

Create a filedrop from a datastream

  1. Log into your Observe instance and click Datastreams on the Navigation bar.

  2. From the list of Datastreams, select a datastream you want to use or create a new one.

  3. Click Create and select Filedrop.

  4. Enter a name for the file or leave blank to automatically create a name.

  5. Add an optional Description.

  6. For the AWS IAM Role field, follow the AWS format (arn:aws:iam::<your-aws-account>:role/<role-name>) and specify a role which doesn’t exist yet. This role will be used later. For example, this role could be arn:aws:iam::1234567890:role/observe-filedrop.

Creating a filedrop

Once your Observe Filedrop is created, take note of the following properties in the details page:

Filedrop config
  1. IAM Role Name: Use the name you provided during the Filedrop creation. This should be the suffix part of the role ARN. (observe-filedrop)

  2. Destination URI: The S3 URI where data will be written, typically starting with your Observe customer ID followed by s3alias. (s3://143133782262-sidzios9e7pn6kpen5z6i3p5epxxgusw2a-s3alias/ds1slREpdJ9kDtmcL9xH/)

  3. S3 Access Point ARN: Noted during the Filedrop setup, it grants the Lambda permission to write to Filedrop. (arn:aws:s3:us-west-2:158067661102:accesspoint/143133782262)

Create a CloudFormation stack based on a template

Pick a forwarder template from CloudFormation Quick-Create Links based on your AWS region. For example, to create a CloudFormation stack for a forwarder in us-west-2, use the highlighted one in the following image.

Pick a proper forwarder template.

You will be redirected to AWS Console for CloudFormation, and continue to complete creating the CloudFormation stack. Fill out the required fields based 3 fields noted in the previous step and specify an S3 bucket or S3 buckets you want Observe to read files from.

  1. IAM Role Name: observe-filedrop

  2. Destination URI: s3://143133782262-sidzios9e7pn6kpen5z6i3p5epxxgusw2a-s3alias/ds1slREpdJ9kDtmcL9xH/

  3. S3 Access Point ARN: arn:aws:s3:us-west-2:158067661102:accesspoint/143133782262

  4. S3 buckets: observe-filedrop-example-bucket

Create a CloudFormation stack.

Acknowledge the required access capabilities and click on Create stack.

Enable Amazon EventBridge notification

Go to the properties tab for the S3 buckets you specified in the previous step and enable notifications to Amazon EventBridge.

Enable EventBridge.

You are all set! Upload files to the S3 buckets or wait until new files land on these S3 buckets. After that, you can see records from these files in the datastream. To learn more about how files are sent to Observe from your S3 buckets, see https://github.com/observeinc/aws-sam-apps/blob/main/docs/forwarder.md

Create a filedrop from a datastream

  1. Log into your Observe instance and click Datastreams on the Navigation bar.

  2. From the list of Datastreams, select a datastream you want to use or create a new one.

  3. Click Create and select Filedrop.

  4. Enter a name for the file or leave blank to automatically create a name.

  5. Add an optional Description.

  6. For the AWS IAM Role field, follow the AWS format (arn:aws:iam::<your-aws-account>:role/<role-name>) and specify a role which doesn’t exist yet. This role will be used later. For example, this role could be arn:aws:iam::1234567890:role/observe-filedrop.

Creating a filedrop

Once your Observe Filedrop is created, take note of the following properties in the details page:

Filedrop config
  1. IAM Role Name: Use the name you provided during the Filedrop creation. This should be the suffix part of the role ARN. (observe-filedrop)

  2. Destination URI: The S3 URI where data will be written, typically starting with your Observe customer ID followed by s3alias. (s3://143133782262-sidzios9e7pn6kpen5z6i3p5epxxgusw2a-s3alias/ds1slREpdJ9kDtmcL9xH/)

  3. S3 Access Point ARN: Noted during the Filedrop setup, it grants the Lambda permission to write to Filedrop. (arn:aws:s3:us-west-2:158067661102:accesspoint/143133782262)

Create the following module in terraform

  1. name: Name of IAM role expected by Filedrop. This name will also be applied to the SQS Queue and Lambda Function processing events. In the absence of a value, the stack name will be used. (observe-filedrop)

  2. arn: The access point ARN for your Filedrop. (arn:aws:s3:us-west-2:158067661102:accesspoint/143133782262)

  3. bucket: 143133782262-sidzios9e7pn6kpen5z6i3p5epxxgusw2a-s3alias

  4. prefix: ds1slREpdJ9kDtmcL9xH/

  5. source_bucket_names: Specify an S3 bucket or S3 buckets you want Observe to read files from ( observe-filedrop-example-bucket)

terraform {
  required_providers {
    observe = {
      source  = "terraform.observeinc.com/observeinc/observe" 
      version = "~> 0.14.16"
    }
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.60"
    }
  }  
}

module "observe" {
  source  = "observeinc/collection/aws//modules/forwarder"
  version = ">= 2.10"

  name = "observe-filedrop"

  destination = {
    arn    = "arn:aws:s3:us-west-2:158067661102:accesspoint/143133782262"
    bucket = "143133782262-sidzios9e7pn6kpen5z6i3p5epxxgusw2a-s3alias"
    prefix = "ds1slREpdJ9kDtmcL9xH/"
  }

  source_bucket_names = [ "observe-filedrop-example-bucket" ]
}

Enable Amazon EventBridge notification

Go to the properties tab for the S3 buckets you specified in the previous step and enable notifications to Amazon EventBridge.

Enable EventBridge.

You are all set! Upload files to the S3 buckets or wait until new files land on these S3 buckets. After that, you can see records from these files in the datastream.

To ingest data with Observe Filedrop from an S3 bucket, you will create a user, create a policy, and associate them with the bucket.

Create a User and Role for Filedrop

Create an AWS IAM User with permission to add files to an AWS S3 bucket.

Creating an AWS IAM User

Figure 1 - Creating an AWS IAM User

Create an AWS role with a custom Trust Policy using the following example policy:

{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Sid": "Statement1",
			"Effect": "Allow",
			"Principal": {
				"AWS": "arn:aws:iam::123456789012:user/observe-filedrop-demo"
			},
			"Action": "sts:AssumeRole"
		}
	]
}

Replace 123456789012 with your AWS Account ID.

Adding a custom Trust Policy

Figure 2 - Adding a custom Trust Policy

Add a Principal.

Adding a Principal

Figure 3 - Adding a Principal

You have created an AWS IAM Role with an AWS IAM User.

Create a Policy for Filedrop

Next, create a policy with the following permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:GetObjectTagging",
                "s3:PutObjectTagging"
                ],
            "Resource": "arn:aws:s3:::observe-filedrop/*"
        }
    ]
}

If you want to send all files or objects under the specified directory or prefix, you need to add s3:ListBucket as an additional action and arn:aws:s3:::observe-filedrop as an additional resource to the observe-filedrop-policy-demo policy.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:GetObjectTagging",
                "s3:PutObjectTagging",
                "s3:ListBucket"
                ],
            "Resource": [
            "arn:aws:s3:::observe-filedrop",
            "arn:aws:s3:::observe-filedrop/*"
            ]
        },
    {
            "Effect": "Allow",
            "Action": "s3:PutObject",
            "Resource": "*",
            "Condition": {
                "StringLike": {
                    "s3:DataAccessPointArn": "arn:aws:s3:us-west-2:123456789012:accesspoint/123456789012"
                }
            }
        }
    ]
}
Specifying Permissions for the Policy

Figure 4 - Specifying Permissions for the Policy

Attach the policy to the Filedrop demo role.

Attaching the Policy to the Filedrop Demo Role

Figure 5 - Attaching the Policy to the Filedrop Demo Role

Add permissions to the role.

Adding Permissions to the Filedrop Demo Role

Figure 6 - Adding Permissions to the Filedrop Demo Role

Create an Access Key for Filedrop

Create an AWS Access Key.

Creating an AWS Access Key

Figure 7 - Creating an AWS Access Key

Select Command Line Interface (CLI) from the list of use cases.

Confirm that you want to create an access key and click Next.

Save the access key and secret access key.

Configuring the AWS CLI

At the AWS command prompt, use the following to configure AWS:

$ aws configure
AWS Access Key ID [****************LNZP]: AKIARKQVG-<redacted>
AWS Secret Access Key [****************vgsO]: <redacted>
Default region name [us-west-2]: us-west-2
Default output format [json]: json

Associate the Access Key with the Filedrop Profile

Create an AWS profile called filedrop.

vim ~/.aws/credentials
[default]
aws_access_key_id = AKIARKQVG-<redacted>
aws_secret_access_key = <redacted>

[filedrop]
role_arn=arn:aws:iam::123456789012:role/observe-filedrop-role-demo
source_profile=default

Creating a Filedrop in Observe

  1. Log into your Observe instance and click Datastreams on the Navigation bar.

  2. From the list of Datastreams, select a datastream you want to use or create a new one.

  3. Click Create and select Filedrop.

  4. Enter a name for the file or leave blank to automatically create a name.

  5. Add an optional Description.

  6. Add your AWS IAM Role that you created previously in the format, arn:aws:iam::<account>:role/<role-name>.

  7. For the AWS Region, only us-west-2 is available for Filedrop.

Creating a Filedrop in Observe

Figure 8 - Creating a Filedrop in Observe

  1. Click Continue to create the Filedrop.

Use the AWS CLI cp command to copy a local file or S3 object into the Observe S3 access point. The command looks similar to the following CLI:

aws s3 cp s3://<your file path> s3://<yourinstance>-s3alias/<S3-prefix>/ --profile filedrop

Example AWS CLI Commands

To send all files in the directory, Spark, use the following example:

aws s3 cp s3://observe-filedrop/Spark/ s3://123456789012-ab1cdE2FGhiJKlmnop34Q:rstUv5w6Xy7z8AB_CdeFg9h0iJK1mnOPq-s3alias/ab12CD34eFghI4p5jk6m/ --profile filedrop --content-type text/plain --recursive

To send a tab-delimited file, use the following example command:

aws s3 cp s3://observe-filedrop/MOCK_DATA.txt s3://123456789012-ab1cdE2FGhiJKlmnop34Q:rstUv5w6Xy7z8AB_CdeFg9h0iJK1mnOPq-s3alias/ab12CD34eFghI4p5jk6m/ --profile filedrop --content-type application/x-csv;delimiter=tab

Alternatively, you can use your choice of programming languages and perform the s3:PutOBject operation to drop files into the Observe S3 access point.

S3 Bucket (in us-west-2 region): <yourinstance>-s3alias
S3 prefix: <S3-prefix>

Ingesting data files from an S3 bucket

Ingest objects uploaded to an S3 bucket using the Observe Lambda forwarder and Amazon S3 Event Notifications.

Warning

If your notification writes to the same bucket that triggers the notification, it could cause an execution loop. For example, if the bucket triggers a Lambda function each time an object is uploaded, and the function uploads an object to the bucket, then the function indirectly triggers itself. To avoid this, use two buckets, or configure the trigger to only apply to a prefix used for incoming objects.

Granting S3 permissions to publish event notifications to Lambda

To publish event notification messages, the Amazon S3 principal must be able to call the API and publish messages to the Observe Lambda Forwarder. These permissions are configured for you when you enable event notifications on a bucket, described below. (For more information, see Granting permissions to invoke an AWS Lambda function in the AWS documentation.)

Enabling notifications in the S3 console

  1. Navigate to S3 in the AWS Console.

  2. Select the bucket that you want to forward data.

  3. Click on Properties.

  4. Under Event notifications, click Create event notification.

  5. In the General configuration section.

    • Enter a description in Event name. If not provided, AWS generates a globally unique identifier (GUID) to use for the name.

    • If desired, provide a Prefix to filter event notifications by prefix. For example, you may use a prefix filter to receive notifications only when files are added to a specific folder (like images/.)

    • Similarly, filter event notifications by suffix by providing a value for Suffix. (Optional)

    For more information, see Configuring event notifications using object key name filtering.

  6. Under Event types, select the event types to receive notifications.

    • Observe recommends All object create events.

  7. In the Destination section.

    • Choose Lambda function as the event notification destination.

    • In the Lambda function dropdown, choose the name of your Observe Lambda Forwarder function.

  8. Click Save.

See the AWS S3 Documentation for full details.

Granting the Observe Lambda Forwarder permissions to access your S3 Bucket

A Lambda function has a policy, called an execution role, that grants permission to access AWS services and resources. In order to GET Objects out of an S3 bucket in response to an Event Notification, your Observe Lambda Forwarder must have permission to access the S3 bucket.

  1. Navigate to Lambda in the AWS Console.

  2. Select the Observe Lambda function (created by the forwarder or integration installation process).

  3. Select the Configuration tab.

    • Select Permissions from the left menu.

    • Under Execution Role, choose the Role name. This displays the role details in a new IAM console tab.

    Lambda permissions configuration

    Figure 9 - Lambda Permissions Configuration

  4. In the Permissions tab, click on AllowS3Read policy. If you don’t see this policy, click Show more to show hidden policies.

    • Click Edit policy and then the JSON tab.

    Editing Lambda policy in the UI

    Figure 10 - Editing the Lambda policy

    • Add the following snippet under the Resource section for each S3 bucket you wish to forward events:

      Example:

      {
      "Version": "2012-10-17",
      "Statement": [
          {
              "Action": [
                  "s3:Get*",
                  "s3:List*"
              ],
              "Resource": [
                  "arn:aws:s3:::observe-collection-bucket-123abc",
                  "arn:aws:s3:::observe-collection-bucket-123abc*",
                  "arn:aws:s3:::additional-bucket-1",
                  "arn:aws:s3:::additional-bucket-1*",
                  "arn:aws:s3:::log-bucket-2",
                  "arn:aws:s3:::log-bucket-2*"
              ],
              "Effect": "Allow"
          }
      ]
      }
      
  5. Click Review Policy.

  6. Click Save changes.

For each log bucket (“Target bucket”), add a trigger so the forwarder can send new files as they generate.

Forwarding logs using Lambda

If necessary, install the Observe AWS Integration or the standalone Observe Lambda forwarder following the instructions in the documentation.

If you currently use the Lambda forwarder, you do not need to install it again.

Add a trigger for each log bucket, Target bucket, so the forwarder can send access logs as they generate.

  1. Navigate to Lambda in the AWS Console.

  2. Select the Observe Lambda function (created by the forwarder or integration installation process).

  3. Select Add Trigger, then search for S3.

    Type S3 in the form and select it to add an S3 trigger

    Figure 11 - Adding the S3 trigger

  4. Configure the trigger with the following settings:

    • Bucket - the log bucket

    • Event type - the desired events to send, such as All object create events

    • Prefix or Suffix if desired. (Optional)

  5. Choose Add to save.

Amazon S3 Bucket Access Logs

Observe can collect the access logs for Amazon S3 Buckets for security observability purposes. First, review collecting data from an S3 bucket with Observe. That document reviews how to use Observe’s Filedrop feature or Lambda Forwarder to collect data. When Amazon S3 Bucket auditing is turned on, it will write the access log to a bucket, which can be collected.

Enabling S3 access logging

S3 bucket access logging is disabled by default. If needed, first enable logging for the desired bucket:

  1. Navigate to S3 in the AWS Console.

  2. Select the bucket you want to access logs.

  3. Choose Properties.

  4. Under Server access logging, choose Edit.

  5. Select Enable and provide the log destination bucket in Target bucket.

  6. Choose Save changes.

Editing server access logging in the AWS Console

Figure 1 - Editing server access logging in the AWS Console

See the AWS access logging documentation for full details.