Network-Centric Security Research Actions
We provide a discussion on relevant research actions that need to be taken to mitigate the threats, gaps, and challenges previously identified and reported in Appendix A.2 of document D4.3.
- RA1 – Machine Learning. Over the past decade, the role of machine learning in cybersecurity has grown as the threats become more serious and as the technology becomes more capable. Machine learning methods have been effectively used in the prevention and detection stage of network threats. The analysis of the main network threats highlights that one of the main needs in the prevention stage is the availability of tools that can autonomously find and patch vulnerabilities to eliminate potential network weaknesses. Penetration testing is commonly used to look for publicly known vulnerabilities and insecure configurations in the network by planning and generating possible attack exploits. However, network penetration testing requires a significant amount of training and time to perform well. Some automated tools, like Metasploit , can partially reduce these costs, but these tools simply run through a list of pre-selected, known exploits to determine if any machines on a network are vulnerable to them. Recently machine learning has emerged as a plausible way of doing pentests . Current approaches to automated penetration testing have relied on methods that require a model of the exploit outcomes; however, the cyber security landscape is rapidly changing as new software and attack vectors are developed which makes producing and maintaining up-to-date models a challenge. To try and address the need for the exploit, the application of machine learning technique, namely Reinforcement Learning (RL), has been investigated. RL is an AI optimization that does not require a model of the environment to produce an attack policy and instead learns the best policy through interaction with the environment. Schwartz et al.  have proven that the RL algorithm can automatically exploit vulnerabilities on networks and deploy attacks on target machines. On the other hand, Ghanem et al.  suggested the capacity to integrate RL with existing PT systems to execute tasks without human intervention. However, studies exploring the use of reinforcement learning for pentesting typically rely on small environments, often simulated networks of around ten machines with a limited number of exploits provided to the program. As either the complexity of the environment or the number of actions available to the program increases, reinforcement learning can quickly become computationally prohibitive. New research in the context of penetration testing based on reinforcement learning should be focused on developing computationally feasible methods of simulating complex networks that can scale to the size of modern large networks while also handling hundreds of possible exploits. As the next step, these algorithms should be applied in more realistic environments such as VM networks using information from real organizational networks to determine how they can be applied in real-world settings. Beyond finding vulnerabilities with autonomous pentesters, another topic that may need future researches is related to the use of machine learning in vulnerability assessment and management prioritization. In their work of Jacobs et al.  have used data about attacks observed in the wild to build machine learning systems that predict the likelihood of some vulnerabilities being exploited. Jacobs et al.  evaluated the use of machine learning-based risk assessments in conjunction with the CVSS (Common Vulnerability Scoring System) to prioritize the vulnerabilities that are most likely to be exploited. Fang et al.  proposed a model to predict the exploitability and exploitation in the wild of vulnerabilities by grasping the key features of the vulnerability. Further research is needed to improve the accuracy and quality of exploits labeling, exploit database, and the proof of exploits in the wild. More data sources with high coverage and time efficiency should also be investigated. In the detection phase, several researchers have focused on the use of ML for network intrusion and malware detection systems. Intrusion detection systems are typically classified as either misuse-based or anomaly-based. Both methods can make use of different ML methods. The simplest forms of misuse-based detection rely on known indicators of compromise. It allows identifying malicious events quickly and accurately, thanks to the high processing speeds and low false-positive rates. However, since it is based on known threats it doesn’t protect against novel attacks. ML can be used to automate some forms of misuse-based detection by allowing a system to “learn” what different types of attacks look like. If many examples of past attacks are available, a supervised learning classifier can be trained to identify the signs of different types of attacks, without the need for humans to generate specific lists of rules that would trigger an alert. Different from misuse-based detection, anomaly-based detection flags suspicious behavior without making specific comparisons to past attacks allowing for potential identification of novel attacks. This type of detection system is more likely to use unsupervised learning methods to cluster normal traffic within a network and alert as suspicious any activity which deviates from that pattern. However, this type of detection is prone to generate many false positives that are expensive to investigate. This is mainly since normal traffic can be highly variable; just as an example last year in response to COVID-19 millions of employees suddenly began working from home. This has profoundly changed the “usual/normal” network ‘s traffic profile. Research in this area has focused on finding ways to appropriately baseline “normal” traffic for a given network. Moreover, the massive increase in network traffic and the resulting security threats have posed many challenges to detect malicious intrusions efficiently. A research challenge of machine learning is the unavailability of a systematic dataset that reflects the new network attacks. Most of the proposed methodologies are not able to detect zero-day attacks because these models are not trained with enough attack types and patterns. New research should test and verify ML models using the dataset having older and newer attacks. On the other hand, dataset construction is an expensive process that demands a lot of resources and high knowledge. Hence, one of the research challenges is the systematic construction of an up-to-date dataset with enough instances of almost all the attack types. The dataset should be updated frequently to include the latest intrusion instances. Another challenge is related to the lower detection accuracy for certain attack types against the overall detection accuracy of the ML model used. This problem is caused by the imbalanced nature of the dataset so that detection accuracy for the low frequent attacks class is lower than the attacks with more instances. Research in this context requires coming up with an up-to-date and balanced dataset and with efficient techniques that can increase the number of minority attack instances to balance the dataset. Recently, certain techniques like SMOTE , RandomOverSampler, and adaptive synthetic sampling approach (ADASYN Algorithm), have been proposed for reducing the dataset imbalance ratio for improved performance. But there is still room for improvement and more research in this direction is needed. Several Deep Learning-based algorithms have also been studied for application in network intrusion detection showing effective results in detecting malicious attacks due to deep feature learning ability. DL is the subset of the ML which includes many hidden layers to get the characteristics of the deep network. These techniques are more efficient than the ML due to their deep structure and ability to learn the important features from the dataset on its own and generate an output. However, they are quite complex and require high resources in terms of computational power, storage capacity, and time rising some challenges to be implemented in real-time environments. Future direction should also explore the hybrid idea of using DL for feature extraction and ML for classification to reduce the complexity. Other studies  have also evaluated the use of a Deep Learning Approach for IP Hijack Detection. Future research should also investigate the use of machine learning in the context of active defense in the intent to try to study potential adversaries to better anticipate their actions. This is related to the Threat Intelligence activity in terms of means to gather threat intelligence about potential adversaries through the analysis of collected data. Some researchers have explored how ML and text mining can be used to improve threat intelligence analysis. For instance, ML methods can be used to cluster dark web users, or text mining methods could be leveraged to automatically collect, classify, and analyze posts on dark web forums and marketplaces, allowing researchers to identify zero-day exploits before they are deployed . In this context, a fully automated ML system could help in anticipating vectors of attack by searching potential vulnerabilities mentioned on the dark web impacting an organization’s name or a list of its products. Other tools introduced as deceptive tactics could be repurposed to collect threat intelligence about potential adversaries. For example, information related to an attack including the tactics used, the country of attack, and so on can be clustered and used by ML methods to identify similar attacks .
Threats: T2.1.1 – Erroneous use or administration of devices and systems, T2.2.2 – Data session hijacking, T2.3.1 – Exploitation of software bugs, T2.3.3 – Malicious code/software/activity, T2.3.4 – Remote activities (execution), T2.3.6 – Exploitation of vulnerabilities in services and remote tools -COVID-19, T2.3.8 – Attacks to sliced 5G core network, T2.3.5 – Malicious code – Signaling amplification attacks; T2.4.3 – Software bug
Gaps: G2.10 – Gaps on malware detection solution, G2.2 – Gaps on continuous hardening & patching of IT systems, G2.3 – Gaps on security training and awareness toward employees, G2.13 – Gaps on the reduced capacity to perform security operations, G2.15 – Gaps on attack surface awareness, G2.16 – Gaps on the security of the new Open Radio Access Network model
- RA2 – Quantum-safe cryptography and security. The advent of large-scale quantum computing brings a significant threat to information infrastructure. Popular cryptographic schemes, like RSA and Elliptic Curve Cryptography, based upon mathematical problems that are believed to be difficult to solve given the computational power available now, will be easily broken by a quantum computer. This will rapidly accelerate the obsolescence of currently deployed security systems and will put at risk of eavesdropping on information transmitted on public channels. Even encrypted data that is safe against current adversaries can be stored for later decryption once a practical quantum computer becomes available. At the same time, it will be no longer possible to guarantee the integrity and authenticity of transmitted information, as tampered data will go undetected. Meaning communications will become insecure without additional action such as using quantum-safe cryptography and exploiting enablers such as Quantum Key Distribution. Quantum-safe cryptography refers to efforts to identify algorithms that are resistant to attacks by both classical and quantum computers. Post Quantum Cryptography (PQC) represents today one of the most interesting topics for cryptographic research. Post-quantum cryptography has to maintain integrity and confidentiality while preventing different kinds of attacks. Research is typically concentrated on six techniques such as symmetric key quantum resistance, supersingular elliptic curve isogeny cryptography, code-based cryptography, hash-based cryptography, multivariate cryptography, and lattice-based cryptography . NIST has initiated a process to solicit, evaluate, and standardize one or more quantum-resistant public-key cryptographic algorithms . Some challenges however exist: the reconfiguration of legacy devices with cryptosystems is still an open problem, which needs to be solved. To adapt to post-quantum cryptography transition in real-time applications, there is a need to formalize a wide array of standards. For example, integration with mobile communications, emergency services, and critical infrastructure requires studying post-quantum algorithm choices. Just as an example, there is a critical need to ensure that 5G and future standards, will be developed, envision future adoption of PQC for public-key ciphers.
Threats: T2.2.3 – Traffic eavesdropping, T2.2.4 – Traffic redirection
Gaps: G2.5 – Gaps in the standardization process to include formal security verification and security assessment/testing of new protocol/network specifications, G2.7 – Gaps on the deployment of the robust crypto algorithm to cipher user plane traffic while minimizing performance impact and interoperability issues
Highlights on Identified Research Actions
Future research actions on network cybersecurity should be focused on AI, and on ML-based solutions, and start to foresee the use and integration of Quantum Safe Cryptography. Artificial Intelligence (AI) and Machine Learning (ML) have the potential for use in a wide range of network activities including service orchestration, demand management, security response and analytics. AI and ML are playing an increasingly important role in cybersecurity, powering security tools that can analyse data from millions of previous cyber incidents and use it to identify in a fastest timeframe potential threats or new variants of malware allowing quick mitigation reactions. These tools are particularly useful if we consider that cyber criminals are always trying to modify their malware code so that security software is no longer able to recognise it as malicious. But detecting new kinds of malware isn’t the only way that AI and ML technologies can be deployed to enhance cybersecurity: an AI-based network-monitoring tool can also track what users do on a daily basis, building up a picture of their typical behaviour. By analysing this information, the AI can detect anomalies and react accordingly. This way AI and ML enable cybersecurity teams to respond in an intelligent way, understanding the relevance and consequences of a breach or a change of behaviour, and developing in real time an adequate response. The more cybersecurity will become reliant on AI and ML, the more AI and ML will be target of malicious attacks. Moreover, also hackers will use AI to improve their malware to make it resistant to AI-based security tools. Improvements in Quantum computing make breaking some cryptographic protocols more practical. This means that communications will become insecure without additional action such as using quantum safe cryptography and exploiting enablers such as Quantum Key Distribution. There is much activity already underway and the latest requirements for cryptographic protocols for mobile telecommunications have been defined with the need to be quantum safe in mind. In this context one of the main topics is related to the Quantum-secure communications, AKA quantum key distribution (QKD), that is at the heart of every secure communications (essential for both: fibre-based communication, but also for satellite communications), although the most controversial application of Quantum Computing is related to the breaking of current public-key cryptography. An emerging research topic is the Quantum machine learning, that is the integration of quantum algorithms within machine learning programs. While machine learning algorithms are used to compute immense quantities of data, quantum machine learning utilizes qubits and quantum operations or specialized quantum systems to improve computational speed and data storage done by algorithms in a program. Hence Quantum computing can help to improve classical machine learning algorithms and its applications to cybersecurity. Ideas range from running computationally costly algorithms or their subroutines efficiently on a quantum computer to the translation of stochastic methods into the language of quantum theory .
 metaspolit-framework, https://github.com/rapid7/metasploit-framework
 D. R. McKinnel, T. Dargahi, A. Dehghantanha and K.-K. R. Choo, “A systematic literature review and meta-analysis on artificial intelligence in penetration testing and vulnerability assessment,” Computers & Electrical Engineering, vol. 75, pp. 175-188, 2019.
 J. Schwartz and H. Kurniawati, “Autonomous penetration testing using reinforcement learning,” arXiv preprint arXiv:1905.05965, 2019.
 M. C. Ghanem and T. M. Chen, “Reinforcement learning for efficient network penetration testing,” Information, vol. 11, nº 1, p. 6, 2020.
 J. Jacobs, S. Romanosky, B. Edwards, M. Roytman and I. Adjerid, “Exploit prediction scoring system (EPSS),” arXiv preprint arXiv:1908.04856, 2019.
 J. Jacobs, S. Romanosky, I. Adjerid and W. Baker, “Improving vulnerability remediation through better exploit prediction,” Journal of Cybersecurity, vol. 6, nº 1, p. tyaa015, 2020.
 Y. Fang, Y. Liu, C. Huang and L. Liu, “FastEmbed: Predicting vulnerability exploitation possibility based on ensemble machine learning algorithm,” Plos one, vol. 15, nº 2, p. e0228439, 2020.
 G. Karatas, O. Demir and O. K. Sahingoz, “Increasing the performance of machine learning-based IDSs on an imbalanced and up-to-date dataset,” IEEE Access, vol. 8, pp. 32150–32162, 2020.
 E. Nunes, A. Diab, A. Gunn, E. Marin, V. Mishra, V. Paliath, J. Robertson, J. Shakarian, A. Thart and P. Shakarian, “Darknet and deepnet mining for proactive cybersecurity threat intelligence,” em 2016 IEEE Conference on Intelligence and Security Informatics (ISI), IEEE, 2016, pp. 7-12.
 Artificial intelligence shines light on the dark web, https://news.mit.edu/2019/lincoln-laboratory-artificial-intelligence-helping-investigators-fight-dark-web-crime-0513
 U. Noor, Z. Anwar, T. Amjad and K.-K. R. Choo, “A machine learning-based FinTech cyber threat attribution framework using high-level indicators of compromise,” Future Generation Computer Systems, vol. 96, pp. 227-242, 2019.
 T. Shapira and Y. Shavitt, “A Deep Learning Approach for IP Hijack Detection Based on ASN Embedding,” em Proceedings of the Workshop on Network Meets AI & ML, 2020, pp. 35-41.
 Machine Learning Support for Cyber Threat Attribution at FireEye, https://www.fireeye.com/blog/products-and-services/2020/06/machine-learning-support-for-cyber-threat-attribution-at-fireeye.html
 D. Ott and C. a. o. Peikert, “Identifying research challenges in post quantum cryptography migration and cryptographic agility,” arXiv preprint arXiv:1909.07353, 2019.
 Post-Quantum Cryptography PQC, https://csrc.nist.gov/projects/post-quantum-cryptography
 An introduction to quantum machine learning, M. Schuld, I. Sinayskiy, F. Petruccione, Contemporary Physics doi:10.1080/00107514.2014.964942