Tutorial: Improbable Travel Search¶
This tutorial shows you how to implement the classic security authentication problem of “improbable travel”, or “the Superman problem”. When user activity is tagged with geographic locations and times, those locations and times can be analyzed to determine how quickly a person would have needed to move between the locations. Could they realistically have traversed the distance between those locations within the time between the events? Any successful authentication events that exceed 500mph (804 km/h) are potentially useful alerts from this use case, though there are some potential false positive cases that we will discuss.
Requirements:¶
You will need an event Dataset that has fields for a user’s action (such as authentication records) and the user’s latitude and longitude. The location fields should be type
float64
for use with the OPAL function haversine_distance_km, which computes the distance between two locations.If latitude and longitude are not already present in your data, they can be produced from IPv4 addresses using Observe’s lookup_ip_info verb.
You will want to review the rest of the data set for more distinguishing fields to group by in order to increase the strength of your search. For example, you might also group by the user’s device so that you don’t unintentionally blend laptop and mobile phone activity, which can create a false positive.
Choosing the Source Dataset¶
Identify the desired fields in your selected Dataset. In this example data, we will group by the src
field as our device/user of interest and the src_lat
for latitude and src_long
for longitude values.
When planning for an improbable travel data set you should consider the desired time range and granularity. It is best to review a shorter window of data in order to balance alert sensitivity with performance, such as four hours. It may also be best to avoid summarizing commands such as timechart
or statsby
, so that the details of individual records are not missed. Note that these commands can be quite useful for related use cases, such as a dashboard showing the amount of traffic per region.
Figure 1 - Source Data for the Improbable Travel search
Finding Travel Distance and Speed¶
To solve the improbable travel problem, we will add two geographic metrics to the data:
Distance between successive events:
travel_distance
Speed needed to traverse that distance:
travel_speed
Open your Dataset in a new worksheet
Open the OPAL Console panel
Use the following OPAL to create
travel_distance
andtravel_speed
// Rename the source dataset columns to standardize them
rename_col lat:src_lat, long:src_long
// Filter for events with latitude
filter (not is_null(lat))
// Obtain the latitude, longitude, timestamp from the previous event in time by user
// The lag option allows the fetching of the values from previous event
// The frame option is required to ensure the metrics can be accelerated to enable monitor setup
make_col lat_previous:window(lag(lat, 1), frame(back:1440m), group_by(user), order_by(timestamp)),
long_previous:window(lag(long, 1), frame(back:1440m), group_by(user), order_by(timestamp)),
timestamp_previous:window(lag(timestamp, 1),frame(back:1440m), group_by(user), order_by(timestamp))
// compute the distance (Km) between the previous and current locations
make_col distance:haversine_distance_km(
lat, long, // Current Event Location
lat_previous, long_previous // Previous Event Location
)
// Obtain the Duration type value between the previous and current timestamps
make_col event_duration:(duration(timestamp_previous, timestamp))
// Convert the Duration to hours
make_col event_duration_hr:event_duration/1h
// Compute the speed using the Distance (Km) and Speed (Hr) for Km/hr
make_col speed:distance/event_duration_hr
sort desc(timestamp),asc(user)
// Create Metrics from Speed
make_col metrics:make_object("travel_speed":speed,"travel_distance":distance)
flatten_leaves metrics
// Reduce to the columns needed to produce our metrics
pick_col BUNDLE_TIMESTAMP,
valid_from:timestamp,
user,
src,
metric:string(_c_metrics_path),
speed:float64(_c_metrics_value)
interface "metric",
metric:metric,
value:speed
// non documented UI command to highlight rows
highlight speed>=804
Click Run and check that the values are correct.
Figure 2 - Adding Distance and Speed to the Improbable Travel search
Click
Publish New Dataset
and name it (e.g.Improbable Travel Metrics
), then clickPublish
to publish the new Dataset
Creating the Monitor:¶
When we created the dataset, we did not describe the speed limit. In this step, we will create a monitor and add a filter for our speed limit.
Open your saved Metrics Dataset
Click
Create Monitor
Click
Edit Monitored Dataset
Open the Opal console window and add the following to the end:
// Filter for the travel_speed metric where it is equal or over 804 km/h
filter metric = "travel_speed" and speed>=804
Click
Apply
to update the monitored DatasetSet the type to Count and time window to the past hour.
Figure 3 - Setting up the Improbable Travel monitor
Set the alert grouping to
src
as this is our device/user field.
Figure 4 - Setting up group_by in the Improbable Travel monitor
As these are simple point-in-time alerts, we do not need status updates.
Figure 5 - Setting options in the Improbable Travel monitor
Name your Monitor, e.g
Improbable Travel Alert
Click
Save
to activate your MonitorReview the generated alerts under Monitors, Alerts
Figure 6 - Reviewing an alert from the Improbable Travel monitor