Automated Collection and Analysis of Open-Source Cyber Threat Intelligence

In collaboration with the Knowledge Discovery in Databases lab at Kansas State University, this project aims to develop machine learning tools and techniques for collection, analysis, and generation of cybersecurity threat intelligence from Open-Source Intelligence (OSINT) sources (e.g., social media, web forums, dark web). The main components of this research are as follows:

  • Collecting relevant intelligence (documents and data) from multiple sources / media
  • Validating the trustworthiness / reliability of sources using the historical “big picture”
  • Fusing heterogeneous sources into a consistent and comprehensible whole
  • Processing data at high volume and rate to find indicators of emerging threat

Current Team Members:

Shreya Gopal Sundari
Cytisus Eurydice
Avishek Bose (KDD)
PIs: Dr.Vahid Behzadan, Prof William Hsu

Affiliate Research Groups:

Knowledge Discovery in Databases lab (Kansas State University)

Tools and Datasets:

Our initial dataset of ~21000 manually annotated tweets for their relevance to cyber-threat intelligence and the type of threat is available in the project’s Git Repository. For more information on the collection, annotation, and structure of the dataset, please refer to the relevant paper.

Publications:

  1. Bose, A., Behzadan, V., Aguirre, C., & Hsu, W. H. (2019). A Novel Approach for Detection and Ranking of Trendy and Emerging Cyber Threat Events in Twitter Streams. Proceedings of the Foundations of Open Source Intelligence and Security Informatics (FOSINT-SI 2019), Vancouver, Canada, August 27, 2019. 
  2. Behzadan, V., Aguirre, C., Bose, A., & Hsu, W. (2018). Corpus and Deep Learning Classifier for Collection of Cyber Threat Indicators in Twitter Stream. Proceedings of the IEEE International Conference on Big Data 2018 (IEEE BigData 2018) Workshop on Big Data Analytics for Cyber Threat Hunting (CyberHunt 2018), Seattle, WA, USA, December 10-13, 2018.