How to Evaluate Threat Intelligence Feed Metadata for Better Context and Accuracy

Threat intelligence feeds are a powerful way to identify attacks that use known infrastructure and malware. Unfortunately, teams can spend a lot of time chasing down alarms triggered by IP addresses that appear on a threat feed. So how can teams choose which addresses warrant deeper investigation?

Ultimate Security Windows Host Randy Franklin Smith and sales engineer Nicholas Ritter hosted a live webinar to explore how teams can effectively use threat feed metadata and evaluate different types of indicators of compromise (IoCs) to accelerate investigation of the right threat data alarms.

Watch the entire on-demand webcast below or keep reading for a transcribed excerpt that will introduce you the fundamentals of evaluating threat intelligence feed metadata.


Note: Transcripts are generated using a combination of speech recognition software and human transcribers, and may contain errors. Please check the corresponding audio before quoting in print.

2:52] Randy: Well let’s get started. Let’s just start off talking about threat intel data, okay.

Nicholas: All right. So, a lot of the elements I’m going to go over is some of it’s going to be review for some people. The one thing I’ve learned in the security space is that we come from a wide set of backgrounds and experience base, so some of this is going to be review, some of it not, but it’s good to get some basic fundamental knowledge down to understand what we’re talking about. The first is: what is threat intelligence?

I posted here a Gartner definition that I think really applies,

“Threat intelligence is the evidence-based knowledge, including context, mechanisms, indicators, implications and action-oriented advice about an existing or emerging menace or hazard to assets.

So, in the greater context of IT security threat intelligence encompasses really a pretty wide variety of datasets from IP addresses to port numbers or combinations of threat data, IP addresses and port numbers all the way down to hashes and URLs and file names and process names.

Randy: So, you know, the one thing I’d say here if you don’t mind me jumping in with my two cents is that the only thing I would augment in Gartner’s definition here is that it is technically actionable if that makes sense. It’s not like soft information that says here’s the techniques that the bad guys use, here’s how buffer overflows, this kind of buffer overflow works. Its literal pieces of information that you can you know use in a wide variety of automated settings. Would you agree with that?

Nicholas: Oh, for sure. Yes, because threat intelligence discusses who’s doing it, some of how they’re doing it, but it doesn’t really get into the detail of how they’re doing it. We can see a particular application that might be Trojan-ed and we would know that through certain pieces of threat metadata which I’ll go over later, but yeah, it’s definitely exactly as you described.

So, to illustrate what threads encompass and what they contain, data sets is really what it’s all about. Data sets that contain different pieces of information that, either by themselves or coupled with other pieces of information, together can formulate an increasingly accurate and detailed picture of what we’re seeing in an environment. Typically, customers use a SIEM to accomplish this, but not always. So, entries in a threat list are referred to as indicators or indicators of compromise or IoCs. One important point of clarification that I will reiterate several times during this presentation is that IoCs should not be treated with a 100 percent confidence at any point. In fact, the more IOCs you corroborate together or correlate together, the better.

So, this screenshot is from the LogRhythm SIEM specifically, but I’m just showing you what a threat list looks like, what the content looks like. So here, we have IP addresses that happen to be from a ransomware command and control list. They can be URLs that a proxy server would see or that a firewall, next-gen firewall would see. They can be hashes. They could also be process names, file names, they could even be usernames. So, threat lists as a whole really can encompass a lot of different data sets and no data set is better than another and where you get them from varies of course, but just know that we have a wide variety of data sets to choose from that fill certain gaps of visibility and awareness in an environment.

What is on a threat feed slide

Figure 1: Screen shot of presentation slide displaying a threat list

So, a couple other terms to go through.

The first, which you may have heard relating to threat list is TAXII™, which is an acronym and it stands for Trusted Automated Exchange of Intelligence Information. It’s a protocol that’s that you’ll see usually coupled with another term that I’m going to go through in a seconSIMEd, but which is STIX™ (structured threat information expression), but TAXII is the format by which we can transport data across, between different threat providers or between a threat provider and a threat user. STIX is relating to that. You’ll pretty much always see STIX and TAXII used together. One is for the formatting of the data and the standardization of how we format threat data that we’re going to share with others and the other one goes over how we transport that STIX formatted data with another user.

So, STIX and TAXII are two terms you’re going to see a lot. Over the last several years, we’ve been seeing them increasingly because it’s an emerging standard. Well first it was an emerging in standard and now it pretty much is a standard. A lot of the time when you’re using some sort of integration mechanism whether it’s one you write, or one that you download, or one that you purchase, make sure that it’s STIX/TAXII compliant, because that’s how you’re going to get access to more threat feeds.

So, some other important terms to go through that I’m going to touch on a lot during this presentation are terms that apply to really all threat feeds. Although, not all threat feeds have them, certainly all STIX/TAXII compliant feeds have them, but even when the data is there, the data might not be either populated or it might be populated with some sort of generic value. The first is confidence level, usually an integer value between 0 and 100. Sometimes, it’ll be referred to as confidence_level or conf_lvl, but some sort of piece describing the confidence level that the threat provider has for the data presented in the list. It’s a per entry flag.

The next is IoC type or indicator type and it will be noted as if_type or, IoC_type, or indicator_type. This is the piece of information that tells us not only what confidence level the threat provider has for the claim that this content is what they think it is, but this IoC type is telling us what that content maps to — whether it’s malware, or reconnaissance, or denial service botnets, command and control, that sort of thing. So, these two pieces of information are our two things that I think that I’ve experienced with customers are very underutilized. A lot of the time, even customers who have purchased commercial threat feeds don’t know that this stuff exists, so I’m going to go through how we’re going to utilize this information to increase the efficiency of our alarming and the efficiency of our analysis.

So, where do threats feeds come from?

They don’t come from the ether, that’s for sure. Threat data comes from our threat list, which comes from the collective observations of security researchers and analysts that exist throughout the world. Sometimes, they are researchers and analysts that are part of an organization that’s dedicated to researching threats and observing the internet to find what’s going on, looking for bad actors. But they can also be researchers and analysts that exist within an organization that is you know could be retail could be health care. So, it’s a wide variety of people that that this information comes from and that leads to one of the caveats of threat lists, which is how high quality that threat list data is.

Another place where we get threat list data from is from what I refer to as sensor networks. I’m not sure that I’m the only one that refers to them this way, but I’m pretty sure others do as well.

So, by sensor networks I’m referring to organizations or just simply infrastructural components that feedback sensory input back into some central location which then can get shared out into a threat list. So, examples would be like DShield, Symantec DeepSight, any kind of integrations or outputting from tools like MISP or Suricata or snort, the IPS engines and firewalls. Cisco has an organization that shares threat information. There’s CrowdStrike, there’s BrightStor, there’s lots of organizations out there that share this type of information.

Randy: I’ve got a couple thoughts on what you’ve covered so far. So, Nick you know with thinking about where this data comes from and those two pieces of metadata, the confidence level especially, have you any idea how different organizations arrive at what their confidence level is?

Nicholas: That’s a very good question. So, the first piece, that is the most common that I’ve seen, is a voting mechanism. This is the best example of that. Where the number of times that something that an IOC or a suspected IOC is seeing in an environment in an anomalous manner, whether it be through a search in a SIEM platform or through an alarming mechanism, the number of times that something is seen by itself usually will either automatically or not automatically up the value of confidence level. The other way that it happens is when you have seasoned or otherwise specifically trained security analysts that are taking what they’re seeing and they’re using it as the initial pivot point for threat hunting to determine whether what they think is going on, the anomalous behavior, like say an IP address that they’d like to put on threat list is actually indeed doing what it looks like it’s doing and once they do some investigation that may involve some reverse engineering or deeper packet inspection, they can then again raise that confidence level.

Randy: That makes a lot of sense. I had a feeling that it probably varied widely from source to source, and I think it probably means that regardless of what their method is, it’s probably valuable in relation to other IoCs from the same organization, but maybe you need to be careful about interpreting confidence levels the same way from different organizations. They may not be that comparable, you know?

Nicholas: Yeah, I think it to some extent it depends upon whether you’re getting a threat list that’s freely available on the internet and you don’t know what their curation method is versus looking in a TIP platform where you can get a sense of who else is seeing it and how widespread something is whether it’s national, international, or state-centric. You can begin to get a feel for how prolific something is and begin to formulate your own interpretation.

A lot of the times though the lists that you get off the internet are more vote-based or observational in nature. Kind of like when you go to SANS Internet Storm Center and you see how many times some IP showed up it’ll have a count by a number of observations, that kind of thing. But I do warn though that there are some lists out there that I’m not going to name right here, but that I always refer to with customers as lists that I don’t trust because it’s too easy to get on it and it’s too hard to get off of it.

Randy: Mm-hmm. I know about those.

Nicholas: Yeah, because in the end that breeds false positives and alarms.

Randy: Yeah, yeah absolutely. And if you’re spinning your wheels chasing something like that than you are probably missing the important real vulnerability.

Nicholas: Yeah!

[16:01] Randy: Okay, all right cool. Keep going.

Want to learn more about how to use threat feed data to improve your threat hunting accuracy with threat intelligence feed metadata? Watch the full on-demand webcast here.