Example: Use unified hosts-domains and URL threatlists


In this example, you explore using the Unified Hosts-Domains Threatlist and Unified URL Threatlist against data that contains Web activity URLs.

You cover the following topics in this tutorial:

Shape the data

You have a Dataset with a FIELDS column with the following data:

{"source":"weblogs","timestamp":"1684950343.405","url":"https://docs.observeinc.com/en/latest/content/common-topics/GS-observe.html"}

  1. Click on Untitled Worksheet and rename it to Example/Host-Domain-URL IOC Matches.
  2. Click on Stage 1 and rename it to Form Data.

Extract fields from the JSON payload

  1. Open the menu for the FIELDS column and select Extract from JSON.
  2. From the right menu, select the following fields. Some fields may be in nested objects:
  • source
  • timestamp
  • url
  1. Click Apply.

The extraction creates the following OPAL:

// Note we use a lower function on the url to avoid case matching issues later
make_col source:string(FIELDS.source),
    timestamp:string(FIELDS.timestamp),
    url:lower(string(FIELDS.url))

Form timestamp from event data

  1. You want to designate the timestamp, in Epoch seconds, from the data as the event timestamp. The JSON extraction creates a string by default. You want to change it to a valid timestamp object.

Change the string function to the from_seconds function to create a proper timestamp object type.

// Extract fields from JSON
make_col source:string(FIELDS.source),
    timestamp:from_seconds(FIELDS.timestamp),
    url:lower(string(FIELDS.url)) 
  1. Add the following OPAL to set a set_valid_from. For better performance, don’t use a new value that deviates too much from the original ingest time.
// Set valid from for the data
set_valid_from options(max_time_diff:4h), timestamp

Form host field from the URL field

  1. Next you extract the host value from the url because you want to check this value as well as the full url value against the Threat Intelligence IOCs. Add the following OPAL code:
// extract host from url
extract_regex url, /\/(?P<url_host>[^\/:]+)[:\/]?/
  1. Add the following OPAL to reduce the columns to only the fields of interest.
// Keep fields of interest
pick_col timestamp,
    source,
    url,
    url_host
  1. Click RUN.

  1. Click Save Worksheet.

Match the url_host field to the IOCs

  1. Minimize the OPAL console so you can select Link New Stage.
  2. Rename the new Stage Host IOC Matching.
  3. Open the OPAL console.
  4. Click the Inputs tab.
  5. Search and add Unified Hosts-Domains Threatlist.
  6. Note the name of the added Input as you use it in the join statement.
  1. Click the OPAL tab and add the following OPAL code.

// Note we use the lower() function to ensure values match consistently

join on(lower(url_host)=lower(@"Threat_Intel_Basic/Unified Hosts-Domains Threatlist".tip_domain)),
  tip_match:@"Threat_Intel_Basic/Unified Hosts-Domains Threatlist".tip_domain,
  tip_provider:@"Threat_Intel_Basic/Unified Hosts-Domains Threatlist".tip_provider,
  tip_provider_id:@"Threat_Intel_Basic/Unified Hosts-Domains Threatlist".tip_provider_url,
  tip_severity:@"Threat_Intel_Basic/Unified Hosts-Domains Threatlist".tip_severity,
  tip_tags:@"Threat_Intel_Basic/Unified Hosts-Domains Threatlist".tip_tags,
  tip_source:@"Threat_Intel_Basic/Unified Hosts-Domains Threatlist".tip_source,
  tip_tlp:@"Threat_Intel_Basic/Unified Hosts-Domains Threatlist".tip_tlp,
  tip_category:@"Threat_Intel_Basic/Unified Hosts-Domains Threatlist".tip_category,

  tip_match_field:"url_host"
  1. Click Run.
  2. Click Save worksheet.

Match the url field to IOCs

  1. Minimize the OPAL console.
  2. Return to the Stage Form Data and select Link New Stage. This links you to the original data.
  3. Rename the new Stage URL IOC Matching.
  4. Open the OPAL console.
  5. Click Inputs.
  6. Search and add the Unified URL Threatlist dataset.
  7. Note the name of the added Input as you use in the join statement.
  1. Click the OPAL tab and update it using the following OPAL code.

// Note we use the lower() function to ensure values match consistently

join on(lower(url)=lower(@"Threat_Intel_Basic/Unified URL Threatlist".tip_url)),
  tip_match:@"Threat_Intel_Basic/Unified URL Threatlist".tip_url,
  tip_provider:@"Threat_Intel_Basic/Unified URL Threatlist".tip_provider,
  tip_provider_id:@"Threat_Intel_Basic/Unified URL Threatlist".tip_provider_url,
  tip_severity:@"Threat_Intel_Basic/Unified URL Threatlist".tip_severity,
  tip_tags:@"Threat_Intel_Basic/Unified URL Threatlist".tip_tags,
  tip_source:@"Threat_Intel_Basic/Unified URL Threatlist".tip_source,
  tip_tlp:@"Threat_Intel_Basic/Unified URL Threatlist".tip_tlp,
  tip_category:@"Threat_Intel_Basic/Unified URL Threatlist".tip_category,
  tip_match_field:"url"

Merge url and url_host IOC matches

You now have two different stages from the main data matching each unified list.

  1. To merge the results, locate the stage URL IOC Matching and select Link New Stage.
  2. Rename the new Stage Merged Results.
  3. On the OPAL console, click Inputs.
  4. Search and add the Host IOC Matching.
  5. Note the name of the added Input as you use it in the union statement.
  1. Add the following OPAL to combine the results in the stage.
//merge in the host IOC matches
union @"Host IOC Matching_-vlua"

// use a statsby to reduce to what all matches by the URL in case you get matches on host and full url

statsby timestamp:last(timestamp),
  tip_match:string_agg_distinct(tip_match,","),
  tip_match_field:string_agg_distinct(tip_match_field,","),
  tip_provider:string_agg_distinct(tip_provider,","),
  tip_provider_id:string_agg_distinct(tip_provider_id,","),
  tip_severity:string_agg_distinct(tip_severity,","),
  tip_tags:array_agg_distinct(tip_tags),
  tip_source:string_agg_distinct(tip_source,","),
  tip_tlp:string_agg_distinct(tip_tlp,","),
  tip_category:string_agg_distinct(tip_category,","),
  group_by(url)

You have now completed the usage example. You could also use this Worksheet to make a Dataset and subsequent Monitor if so desired.