Cybersecurity Threat Intelligence Empowered by Graph Data Visualization

OVERVIEW

Hackers continue to find more and more new ways to get past our cyber defenses. Verizon’s 2020 Data Breach Incident Report (VDBIR) identified a "record total of 157,525 incidents” with “3,950 confirmed data breaches.”

Cybersecurity analysts face a classic big data problem when it comes to analyzing disparate sources of SIEM/ log management data. With graph intelligence, they can cut through the noise to investigate cyber attacks in a flexible way. Cybersecurity threat intelligence supported by intuitive graph analytics can evolve with the complexities of cyber attacks and give analysts a better situational understanding for investigative analysis.

SIEM/LM DATA

In this use case, we’ll be looking at log management ( LM ) data provided by Nginx, a notable open source web server with low memory usage and high concurrency. It runs security information and event management ( SIEM ) with EventTracker using Syslog to provide LM reports. With LM data that monitors the security and operations of the server, we’ll look into suspicious activity more in-depth.

We start by downloading a sample data set with a log of IP addresses, access times, paths, and agents on a company’s server.

Download sample data attached to this blog post here.

Looking at this data, we can point out suspicious behavior along the path of the server. We noticed–for this case in particular–that PHP script (a general-purpose scripting language primarily for web development) was out of scope. Knowing this, we safely assumed that these paths were explicit attacks on the server. We took note of this and began our analysis to answer the following questions:

  1. Where are these attacks coming from?
  2. When did these attacks happen?
  3. Which ones are explicitly malicious and/or tricky to detect?

Moving to the graph space

We drag-and-drop log data containing the IP address, access time, path, and agent as a .csv file into a new GraphXR project. We immediately see each row in the table populate as a node on the graph. You can work with the data in question on GraphXR directly to find “php” script being used by hackers (much like using search cmd+f to find all the paths in question.

To work with only the pathways that incorporate php script, we perform our first Transform under the f(x) panel to add a new property on all the nodes: isHack . Using the short javascript below under the custom functions, we can assign nodes with “php’’ in the path as isHack = 1. If no php is in the path, then isHack = 0. Applying the Filter function, we can delete all nodes with isHack = 0 to focus only on what we know to be hacker-related incidents.

(propVal,props) => (/\php/gi).test(propVal)? 1:0

Geolocate IP Addresses

Now that we’ve filtered our data to look at this suspicious activity more head on, we can identify the origin location of this threat via the IP addresses. Going back to our Transform panel, we extract the IP address to see who might be a repeat offender and where they are sending their attacks from.

Immediately clusters emerge around certain IP addresses in which we can observe the concentration of incidents on an IP address. Taking it to the world map, we can see where these attacks originate from.

Click on the Map function on the left hand and go back to Transform to connect these clusters to the map. Select the Connector tab to access the IP => Geo ( iplocation.com ) tool. The lat/lng data is embedded in the IP address using open source IP locator tools. Check Create New Node and click Run to watch the graph transform. You’ll see the location node connected to the IP address which is connected to the paths of attack on the server.

Looking back at our Nginx log under the Agent column, we can find the operating systems used (windows, mac, linux, or bot) to perform the attack. To pull that property from the long url, we use the following javascript, inputting it as a custom formula under the f(x) transform.

(propVal,props) =>(/window/ig).test(propVal) ? "window" : (/mac/ig).test(propVal) ? "mac" : (/linux/ig).test(propVal) ? "linux" : (/bot/ig).test(propVal) ? "bot" : propVal

Performing time-series

The AccessTime on the original Nginx log is configured in a format not convenient to use. The original format looks like this:

29/Jan/2019:01:02:03 +0000

Instead, we amend it in the format below:

2019-01-29 01:02:03

We do this using the following javascript, once again under the f(x) transform:

(propVal,props) => (d=new Date(Date.parse(propVal.replaceAll('/', ' ').replace(':', ' ')))).toISOString()

Next, we can use the Geometric Layout function to sort by a property. Below, the property AccessTime is laid out on the x-axis from left to right.

Now you can see one hacker did a whole bunch of probing all at once (the vertical line on the right), another spaced out probing over a long period of time making the attack more difficult to detect. And then there are a couple of smaller probings, which the hacker didn’t bother to spread out over time, as well as a cluster of one off probings. It’s possible that some of these are legitimate user activity.

All-in-all, this is only one of many vantage points that can be taken to look at cyber threats. Thank you for following along with this low-code graph solution to geolocate suspicious IP addresses across time.

Contact us to learn more and stay tuned for future cybersecurity graph tricks. Thank you!