Components of a Cyber Solution with Kx

12 Jun 2018 | , , ,
Share on:

by Tim Thornton and Doug Talbott

At Kx25, the international kdb+ user conference held May 18th in New York City, Kx’s Mike Thomas presented the Kx for Cyber platform. You can see his presentation here on the Kx Youtube channel. 

Introduction

The Kx for Cyber platform is a unique combination of traditional SIEM and Security Orchestration, Automation/Analysis and Remediation (SOAR). Powered by kdb+, the world’s fastest time-series database, Kx for Cyber can rapidly process huge data volumes, alerting and reporting on events in real time. This same speed can be leveraged by analysts to prioritize and assess data, quickly discovering and operationalizing actionable insights.

Example Cyber Security Workflow

In this blog we’ll focus on tools designed to aid and augment the security team. Here we provide an example workflow for the ad hoc analysis of cyber data, looking to identify patterns of interest form previously unseen network data. This workflow consists of four main activities, pictured below. We will illustrate how Kx for Cyber can be used to dynamically explore and analyze network data at scale.

Extract, Transform and Load

Kx for Cyber includes a full-featured transformation capability that lets you perform ETL operations on your data. Using a point-and-click interface, you create a wiring diagram of your data transformation by working with a data sample. Once completed, you can save the transformation and run it by either executing it  programmatically or from the user interface.

Kx for Cyber allows you to import a variety of cyber data formats (e.g. CSV, ODBC, JSON, PCAP, NETFLOW, STIX etc.); perform a range of transformations (e.g. type changes, data fills, renaming, splitting columns, merging columns etc.); create table joins; enrich data with other sources, output visualizations; and output transformed data in range of formats.

In Figure 1, we use ~20 million pcap records to create a simple transformation. In the topmost pane, you can see the entire transformation. On the left, pcap data is imported via an Import node (blue). Some of the columns are modified using an Action node (green). An additional geolocation lookup table is generated using a Function node (purple) and joined with the Action node output (orange). We perform some additional data manipulation in two other Action nodes (green) and output the results to an in-memory database (red) and a sample graphic (red). The details of the selected Action node (blue highlight) are shown below the topmost pane. On the left is an action list and on the right is a sample of the result of the actions. Double-clicking any of the actions raises a dialog that allows you to modify the action in a point-and-click fashion. Transformations can be saved, versioned and re-used by team members, or added into runtime threat detection workflows.

Figure 1: Importing and transforming data using our Visual Transformation tool

 

Ad Hoc Data Analysis and Visualization

Once the data has been transformed, you can use Kx for Cyber’s interactive analysis environment to query and visualize your data. Analysts can interactively query massive datasets and visualize their results using the Visual Inspector. The Visual Inspector allows you to query and visualize massive datasets in real-time using a wide range of pre-built charts such as tables, histograms, lines, scatters, bars, networks, and maps, among others. The visual attributes (e.g. size, color, alpha) of each chart can be dynamically manipulated on-the-fly to reference multiple features of the data. Users can also drill down, drill over, and brush any aspect of their data. Like transformations, visualizations can be saved and viewed programmatically or via the GUI. They can also be incorporated into the Dashboards for Kx product for reporting.

In Figure 2, we use the Visual Inspector to search for port scans in the 20M element dataset by plotting the source and destination ports and experimenting with the visual attributes. Kx for Cyber is designed for Big Data so plots of this scale are generated instantly. This means you can explore large datasets on-the-fly. In this figure, UDP and TCP data are plotted together against a timeline. As you can see, few patterns can be observed in the data. We can easily re-plot the data to look at it across different dimensions to see if patterns appear. In Figure 3, we re-plot using just the TCP data and reveal potential port scan patterns (e.g. vertical and diagonal lines). Through drill-down, drill-over, and re-plotting, we explore the lines individually and notice a high volume of traffic is targeting port 22.

Figure 2: Visual Inspector showing UDP and TCP traffic

 

Figure 3: Visual Inspector showing TCP traffic only

 

Custom Visualization Creation

Based on the previous ad hoc analysis, we explore potential port and address correlations by creating two custom visualizations. In Figure 4, we create a diagram containing a matrix of parallel plots. Each plot represents one unique IP address with a high number of source port counts targeting our port 22 over a four-hour window. In our query, we identified 27 unique IP addresses. Each parallel plot represents one source IP address. You can see the source port (far left line in each plot), source IP, protocol, destination IP, and destination port (far right line in each plot). We immediately notice three IP addresses with a high volume of traffic (top left red lines) and one IP address with a very distinctive “fan-in” pattern likely indicating a port attack (left middle).

Figure 4: Matrix of Parallel Plots

 

This visual, running over multiple time windows, validates our heuristic for finding port attacks such as this source IP attacking port 22. This visual also gives a high-level overview of our network in a window, and leverages human pattern matching for identifying port attacks (left fan in), and port scans (right fan out), as well as other common attacks.

In Figure 5, we create a second custom plot using geolocation data from the enrichment done in the Transformer. Here we plot the sum of distinct source ports by unique IP address that have been targeting  port 22 within a given window, the heuristic identified above, and display them geographically for our historical data.  From this plot, we can easily see all of the locations that are heavily targeting port 22 from a large number of source ports.

Figure 5: Geolocation Plot

 

 

From this brief example, you can see how Kx for Cyber can be used for dynamic exploration and visualization of Cyber data.

SUGGESTED ARTICLES

ML feature engineering with kdb+

Feature Engineering in kdb+

28 Jun 2018 | , , ,

Feature engineering is an essential part of the machine learning pipeline. In this blog, Fionnuala Carr discusses the feature engineering JupyterQ notebook, which includes an investigation of four different scaling , their impact on the k-Nearest Neighbors classifiers and the impact of using one-hot encoding.