Ambly: Smart Darknet Spider

Over View

The Darknet is often viewed as a shadowy underworld where unsavory individuals conduct their business away from the watchful eye of society. While this perception holds some truth, it is important to note that anything, good or bad, found on the Darknet can also be found on the Clearnet. The only difference is that the Darknet is more complex and difficult to navigate. Consequently, innovative solutions are required to identify malicious behavior on the Darknet, and Ambly is one such solution.

Ambly is an intelligent spider that has been developed to access Darknet websites. It connects to Tor proxies, allowing it to access Tor websites that are identifiable by the “.onion" domain. During its development phase, Ambly crawled and scraped Darknet websites to create a dataset of URLs and text for further evaluation. It identified a dictionary of terms for labeling websites that contain Cyber Threat Intelligence (CTI), which is crucial to identifying malicious activity on the Darknet relating to cybersecurity.

Ambly’s long-term objective is to crawl every webpage it accesses and determine whether it contains CTI or not. It will then move to other pages that it believes will contain relevant information. To achieve this goal, Ambly uses a machine learning model that analyses the initial crawl data and learns what makes certain websites relevant. The model is trained to identify the attributes that make a group of relevant pages similar. Relevant pages are manually labelled in the initial dataset as either relevant or not, and the model creates the dictionary based on its assessment. As a result, Ambly can grow and evolve over time by re-training the model to recognize language changes on Darknet websites.

Once Ambly is operational, it will be a valuable tool for government agencies looking to research and investigate malicious activity on the Darknet. The task of manually assessing websites can consume a significant amount of human resources, slowing down an agency’s ability to identify and prevent malicious activities. Ambly will help ease this bottleneck by automating the assessment of these websites, allowing analysts to focus on sites that have already been identified as potentially useful.

Keep up with Ambly’s progress by checking out our blog for regular updates!

Current Team Members:

Cytisus A Eurydice
PI: Vahid Behzadan

Tools and Datasets:

Tools used in this research are Google Translate, PermID and MongoDB Atlus

Talks or Connected Engagements:

ReconVillage talk at Defcon 28,August 2020

Darknet Workshops hosted by Trace Labs

Workshop 1

Workshop 2

Publications:

N/A – In the future