Tutorial: Search for improbable travel

This tutorial shows you how to implement the classic security authentication problem of “improbable travel”, or “the Superman problem”. When user activity is tagged with geographic locations and times, those locations and times can be analyzed to determine how quickly a person would have needed to move between the locations. Could they realistically have traversed the distance between those locations within the time between the events? Any successful authentication events that exceed 500mph (804 km/h) are potentially useful alerts from this use case, though there are some potential false positive cases that we will discuss.

Requirements:

  1. You will need an event Dataset that has fields for a user’s action (such as authentication records) and the user’s latitude and longitude. The location fields should be type float64 for use with the OPAL function haversine_distance_km, which computes the distance between two locations.

  2. If latitude and longitude are not already present in your data, they can be produced from IPv4 addresses using Observe’s lookup_ip_info verb.

  3. You will want to review the rest of the data set for more distinguishing fields to group by in order to increase the strength of your search. For example, you might also group by the user’s device so that you don’t unintentionally blend laptop and mobile phone activity, which can create a false positive.

Choosing the Source Dataset

Identify the desired fields in your selected Dataset. In this example data, we will group by the src field as our device/user of interest and the src_lat for latitude and src_long for longitude values.

When planning for an improbable travel data set you should consider the desired time range and granularity. It is best to review a shorter window of data in order to balance alert sensitivity with performance, such as four hours. It may also be best to avoid summarizing commands such as timechart or statsby, so that the details of individual records are not missed. Note that these commands can be quite useful for related use cases, such as a dashboard showing the amount of traffic per region.

Source Data for the Improbable Travel search

Figure 1 - Source Data for the Improbable Travel search

Finding Travel Distance and Speed

To solve the improbable travel problem, we will add two geographic metrics to the data:

  • Distance between successive events: travel_distance

  • Speed needed to traverse that distance: travel_speed

  1. Open your Dataset in a new worksheet

  2. Open the OPAL Console panel

  3. Use the following OPAL to create travel_distance and travel_speed

// Rename the source dataset columns to standardize them
rename_col lat:src_lat, long:src_long

// Filter for events with latitude
filter (not is_null(lat))

// Obtain the latitude, longitude, timestamp from the previous event in time by user
// The lag option allows the fetching of the values from previous event
// The frame option is required to ensure the metrics can be accelerated to enable monitor setup
make_col lat_previous:window(lag(lat, 1), frame(back:1440m), group_by(user), order_by(timestamp)),
    long_previous:window(lag(long, 1), frame(back:1440m), group_by(user), order_by(timestamp)),
    timestamp_previous:window(lag(timestamp, 1),frame(back:1440m), group_by(user), order_by(timestamp))

// compute the distance (Km) between the previous and current locations
make_col distance:haversine_distance_km(
    lat, long,   // Current Event Location
    lat_previous,    long_previous    // Previous Event Location
    )

// Obtain the Duration type value between the previous and current timestamps
make_col event_duration:(duration(timestamp_previous, timestamp))
// Convert the Duration to hours
make_col event_duration_hr:event_duration/1h
// Compute the speed using the Distance (Km)  and Speed (Hr) for Km/hr
make_col speed:distance/event_duration_hr

sort desc(timestamp),asc(user)

// Create Metrics from Speed
make_col metrics:make_object("travel_speed":speed,"travel_distance":distance)
flatten_leaves metrics

// Reduce to the columns needed to produce our metrics
pick_col BUNDLE_TIMESTAMP,
  valid_from:timestamp,
  user,
  src,
  metric:string(_c_metrics_path),
  speed:float64(_c_metrics_value)
  
interface "metric",
     metric:metric,
     value:speed

// non documented UI command to highlight rows
highlight speed>=804
  1. Click Run and check that the values are correct.

Adding Distance and Speed to the Improbable Travel search

Figure 2 - Adding Distance and Speed to the Improbable Travel search

  1. Click Publish New Dataset and name it (e.g. Improbable Travel Metrics), then click Publish to publish the new Dataset

Creating the Monitor:

When we created the dataset, we did not describe the speed limit. In this step, we will create a monitor and add a filter for our speed limit.

  1. Open your saved Metrics Dataset

  2. Click Create Monitor

  3. Click Edit Monitored Dataset

  4. Open the Opal console window and add the following to the end:

// Filter for the travel_speed metric where it is equal or over 804 km/h 
filter metric = "travel_speed" and speed>=804
  1. Click Apply to update the monitored Dataset

  2. Set the type to Count and time window to the past hour.

Setting up the Improbable Travel monitor

Figure 3 - Setting up the Improbable Travel monitor

  1. Set the alert grouping to src as this is our device/user field.

Setting up group_by in the Improbable Travel monitor

Figure 4 - Setting up group_by in the Improbable Travel monitor

  1. As these are simple point-in-time alerts, we do not need status updates.

Setting options in the Improbable Travel monitor

Figure 5 - Setting options in the Improbable Travel monitor

  1. Name your Monitor, e.g Improbable Travel Alert

  2. Click Save to activate your Monitor

  3. Review the generated alerts under Monitors, Alerts

Reviewing an alert from the Improbable Travel monitor

Figure 6 - Reviewing an alert from the Improbable Travel monitor