Editor’s Note: Dr. Sagar Samtani is a speaker for ODSC East 2022 this April 19th-21st. Be sure to check out his talk, An Overview of AI for Cybersecurity: An Overview of the Field and Promising Future Directions, there to learn more about the role of artificial intelligence in cybersecurity!
Artificial Intelligence (AI)-enabled analytics techniques based on machine learning, deep learning, network science, text mining, natural language processing, and more have staked their ground in a variety of disciplines and application areas. To date, some of the most impactful application areas in which AI-enabled analytics have made a difference include healthcare, business intelligence, customer management, and many others. The early successes in these areas have motivated many folks within the AI community to search for other critical application areas where AI can make a positive impact.
Recently, practitioners and academics are indicating that cybersecurity is a discipline that could benefit greatly from AI-enabled analytics techniques. Common tasks that many cybersecurity professionals execute include asset management (identification, categorization), control allocation, threat identification, and vulnerability management (detection, prioritization, mitigation). However, many cybersecurity professionals often face significant issues when executing these tasks, namely data overload and staff shortages. As such, leveraging AI-enabled analytics techniques could offer two key benefits to existing cybersecurity tasks and practices:
- First, given the vast quantities of data that cybersecurity professionals often have to sift through, AI-enabled analytics procedures can help automate many of the tasks that cybersecurity professionals execute in a manual manner. As such, AI-enabled analytics techniques have significant potential to help address some of the workforce shortage issues currently seen in the cybersecurity discipline.
- Second, AI-enabled analytics techniques can help detect patterns within large quantities of data that may be missed by conventional analysis. For example, simple cluster analysis can help cyber analysts group together assets with similar vulnerabilities for targeted remediation activities. This pattern detection can help inform both technical algorithm development efforts and new cybersecurity insights that would be missed otherwise.
As the name suggests, AI-enabled cybersecurity analytics is dependent upon data. Broadly speaking, two significant sources of cybersecurity data exist:
- Internal cybersecurity data refers to data that is often accessible within an organization. Alternatively stated, it is data that an organization would have access to and often control over. Internal data include network traffic, vulnerability assessments, internal code repositories, and others. Key benefits of internal network data include having a low lead time to critical assets and a direct understanding of the most relevant threats, assets, vulnerabilities, and controls for an organization.
- External cybersecurity data pertains to data that is publicly accessible but not mainly related to any single organization. Examples of such data include Dark Web (hacker forums, carding shops, DarkNet marketplaces, Internet-Relay-Chat, paste sites), social media platforms such as Facebook, Twitter, etc. These data are often extremely valuable for helping an organization understand the threats outside its borders.
Four significant themes of cybersecurity research and practice have come about to date. The four listed themes are not mutually exclusive nor exhaustive of all of the possible applications of AI-enabled analytics in cybersecurity. The four themes are as follows:
- Cyber threat intelligence (CTI) focuses on identifying emerging threats and key threat actors to help enable effective cybersecurity decision-making processes. Data ingestion, processing, and dissemination are critical aspects of effective CTI processes. The processing component often comprises techniques such as IP reputation services, event correlation, malware analysis, and others. These conventional techniques are often the ones that see the most benefit due to AI-enabled analytics techniques.
- Security Operations Centers (SOCs) are often the heart of many cybersecurity efforts in enterprise organizations. AI-enabled analytics techniques are helping augment common tasks, especially alert management and vulnerability assessment and management.
- Adversarial attack generation to robustify cyber defenses is focused on automatically producing synthesized attack vectors to help improve the functions and operations of common cybersecurity defenses (e.g., anti-malware engines, anti-phishing engines, etc.). Techniques such as generative adversarial networks and deep reinforcement learning are leveraged to search through a candidate space of allowable actions (e.g., additions or edits to malware binaries) to produce attacks automatically.
- Disinformation and computational propaganda analysis aim to help identify and debilitate the spread of fake news and false content across cyberspace. Predictive analytics and network science-based approaches are essential to combat this ever-growing threat.
Despite the benefits of AI-enabled analytics techniques in each of these major themes of work, current practices have several key challenges:
- Imbalanced class labels: Cybersecurity is in the midst of a paradigm shift, from reactive practices to more proactive approaches. To this end, there has been a stronger focus on developing predictive AI-enabled analytics approaches. However, many datasets are often highly imbalanced in that they are largely skewed towards benign data instances.
- Lack of ground truth and publicly accessible datasets: Establishing ground truth is essential to training valuable predictive models. However, many folks wishing to get into the AI-enabled cybersecurity analytics playing field are often forced to learn from toy, non-realistic, and/or dated cybersecurity data.
- Adaptation of existing AI-enabled analytics techniques: The computer science discipline has been credited with the development of many AI-enabled analytics techniques. These approaches have been largely developed for general purpose applications. However, cybersecurity is a discipline with many unique requirements and data. These often require careful consideration to develop effective AI-enabled cybersecurity analytics.
Despite these current challenges, many promising directions for artificial intelligence in cybersecurity analytics remain. Be sure to come by the sessions related to AI and machine learning at ODSC East 2022 to learn more about existing resources and promising future directions for this emerging field!
About the author/ODSC East 2022 speaker on Artificial Intelligence in Cybersecurity
Dr. Sagar Samtani is an Assistant Professor and Grant Thornton Scholar in the Kelley School of Business at Indiana University. Dr. Samtani’s research focuses on developing AI-enabled techniques for cybersecurity applications, namely Dark Web analytics and smart vulnerability management. Dr. Samtani also teaches graduate-level courses on AI for cybersecurity.