The ability of sharing, managing, distributing, and accessing data quickly and remotely are at the basis of the digital revolution that started several decades ago. The role of data in today’s technology is even more important, having entered the so-called, data-driven economy. Data management and inference based on them are fundamental for any enterprise, from micro to large, to make value and compete in the global market, and replaced the central role that was usually owned by communication means. The data domain observed important changes at all layers of an IT chain: i) data layer: from data to big data, ii) database layer: from SQL to NoSQL, iii) platform layer: from the data warehouse and DBMS to Big Data platforms, iv) analytics layer: from data mining to machine learning and artificial intelligence. For instance, data mining focuses on discovering unknown patterns and relationships in large data sets. Machine learning aims to discover patterns in data, by learning patterns parameters directly from data; it is composed of a training step and the algorithm is not programmed to manage such patterns. It builds and keeps the model of system behavior. Artificial intelligence mimics human intelligence and tries to reason on data to produce new knowledge.
We discuss the threats that can be mapped to the (Big) Data asset taxonomy. In general, threats, such as network outage or malfunctions of the supporting infrastructure, may heavily affect Big Data. In fact, since Big Data has millions of data items and each item may be stored in a separate physical location, this architecture leads to a heavier reliance on the interconnections between servers. Also, physical attacks (deliberate and intentional), natural and environmental disasters, and failures/malfunction (e.g. malfunction of the ICT supporting infrastructure), since their effects are strongly mitigated by the intrinsic redundancy of Big Data, though Big Data owners deploying their systems in private clouds or other on-premise infrastructure should take these attacks under serious consideration. Data are compromised at huge rates, more than 25 million records compromised in the first semester of 2018,[1] with an increased cost of 6.4% in 2018. Social media counts the top amount of breached records, while healthcare leads to the number of incidents. The average cost of a data breach raised to $3.9 million, while the average number of breached records by country was 25,575, with a cost per lost record of 150$ and time to identify and contain a breach of 279 days.[2]
According to ENISA Big Data Threat Landscape,[3] a threat to a Big Data asset can be considered as “any circumstance or event that affects, often simultaneously, big volumes of data and/or data in various sources and of various types and/or data of great value”. It can be further divided in Big Data breach when “a digital information asset is stolen by attackers by breaking into the ICT systems or networks where it is held/transported” and Big Data Leak “the (total or partial) accidental disclosure of a Big Data asset at a certain stage of its lifecycle due to inadequate design, improper software adaptation or when a business process fails”. A Big Data Breach involves a malicious attacker behavior resulting in unauthorized access, while a Big Data Leak involves an honest-but-curious attacker or an observer.
The threat taxonomy is a consolidation of threats previously considered in other documents/reports[4] and is composed of the following category.
Threat T4.1.1: Information leakage/sharing due to human errors
Human errors are among the most critical threats in today’s complex environments.[5][6][7] These errors cause accidental threats, meaning that they are not intentionally posed by humans, and are due to misconfiguration, clerical errors (for example pressing the wrong button), misapplication of valid rules (poor patch management, weak passwords), and knowledge-based mistakes (software upgrades and crashes).
Assets: “Data”, “Infrastructure”.
Threat T4.1.2: Inadequate design and planning or incorrect adaptation
Inadequate design and deployment, including its adaptation, of a Big Data platform, can result in threats to managed data. For example, data replications, though is often seen as a countermeasure to threat T4.4.2, could also represent an attacking driver, in case (one of) these replicas (storage nodes) are weak or simply increase the probability of data disclosure and data leaks. As another example, the use of encrypted storage communicating in a network exchanging data in clear could result in a data leak scenario. The design and deployment of the Big Data platform can then represent a source of threats if not deeply tested and verified. One additional threat related to the design is the lack of scalability of some tools. This threat is also connected to Threat T4.4.2 (Denial of Service)
Assets: “Data”, “Big Data analytics”, “Software”, “Computing Infrastructure models“, “Storage Infrastructure models”.
Threat T4.2.1: Interception of information
It considers an attacker intercepting communication between two communicating links. It is possible to hijack a user session or gain unauthorized access to services in social networks, and communication protocol flaws can result in data breaches. Big Data software distributions (for example Hadoop, Cassandra, MongoDB, Couchbase) do not always use protocols for data confidentiality and integrity between communicating applications (e.g., TLS and SSL) and are not always configured properly (e.g., changing default passwords).
Assets: “Data”, “Roles”, “Infrastructure”.
Threat T4.2.2: Unauthorised acquisition of information (data breach)
Unauthorized acquisition of data following data breaches is also an important threat,[8] and considers incidents resulting in a compromise or loss of data. In addition, GDPR in Europe is predicted to increase the number of extortion attacks. Attackers will try to extort money with the threat of GDPR penalties deriving from data disclosure.[9]
Assets: “Data”, “Roles”, “Infrastructure”.
Threat T4.3.1: Data poisoning
The increasing development of systems that take decisions on the basis of collected data, as well as inferences based on them, make the trustworthiness of data critical. Data poisoning then becomes a fundamental threat to all systems building their processes and activities on data. Data integrity is not the only property to protect and guarantee. Data provenance, non-repudiation, and accountability should also be provided.
Assets: “Data”, “Security and privacy techniques”, “Data management”, “Data privacy”.
Threat T4.3.2: Model poisoning
It aims to poison the machine learning models, by poisoning data (Threat T4.3.1) used for the training of the model. The idea is that if an attacker can poison the data used for training, the resulting model will represent a behavior different from the real and correct behavior of the target system.
Assets: “Data”, “Data Analytics”.
Threat T4.4.1: Identity fraud
Identity fraud is the leading type of data breaches.[6] Access credentials are in fact among the most critical data managed by Big Data platforms. They are used to access personal accounts possibly containing highly sensitive information such as credit card numbers, payment and billing details. Personal data are often coupled with profiling data such as user preferences, habits. These data are often used for impersonation fraud, creating big opportunities for identity thieves.[7] In this context, where social networking is in everyday life, social engineering raises back its importance and becomes a basis for new attacks.
Assets: “Data”, “Infrastructure”.
Threat T4.4.2: Denial of service
Traditional (Distributed) Denial of Service is among the main threats for complex Big Data platforms. They aim to threaten components availability by exhausting their resources, causing performance decrease, loss of data, service outages, on one side, and data availability, on the other side.
Assets: “Infrastructure”.
Threat T4.4.3: Malicious code/software/activity
This class of threats usually targets all ICT stack. They aim to distribute and execute malicious code/software or execute malicious activities that target the confidentiality, integrity, and availability of data. These threats usually involve malware, exploit kits, worms, trojans, and exploit backdoors and trapdoors, as well as developer errors/weaknesses. Malicious software also targets distributed programming frameworks, which use parallel computation, and may have untrusted components.
Assets: “Data”, ”Software”, “Computing infrastructure models”.
Threat T4.4.4: Generation and use of rogue certificates
This class of threats usually targets all ICT stack. They aim to use rouge certificates to access Big Data assets and communication links, causing data leakage, data breaches, misuse of the brand, and upload/download malware or force updates (see Threat T4.4.3).
Assets: ”Data”, “Big Data analytics”, “Software”, “Hardware”.
Threat T4.4.5: Misuse of assurance tools
Assurance is the way to gain justifiable confidence that IT systems will consistently demonstrate one or more security properties, and operationally behave as expected, despite failures and attacks[10][11]. Assurance is based on the audit, certification, and compliance tools and techniques[12]. The manipulation of such tools and techniques can result in scenarios where the malicious behavior of attackers is masqueraded and is not discovered. Assurance information is necessary to ensure the security of the system during its entire lifecycle from its design to its operation. It is also necessary to guarantee compliance with regulations.
Assets: “Security and Privacy Techniques”, “Data”, “Infrastructure”.
Threat T4.4.6: Failures of business process
According to ENISA taxonomy,[13] improper business processes can damage or cause a loss of assets. This class includes threats to confidentiality (e.g., wrong anonymization) and integrity of data (e.g., wrong management of replicas that can bring to scenarios of Big Data degradation, increasing the risk of inconsistent data). This threat points to threats to business processes in the other domains, especially system-centric security and application-centric security.
Assets: “Data”, “Big Data Analytics”.
Threat T4.4.7: Code execution and injection (unsecured APIs)
Big Data applications are built on web services models; APIs can then become a target of the attack, and be vulnerable to well-known attacks, such as the Open Web Application Security Project (OWASP) Top Ten list[14]. In particular, code execution (e.g., XSS) and injection (e.g., SQL injection) are critical classes of attacks that can increase risks. Web Applications attacks and breaches often result in larger data breaches.[15]
Assets: “Data”, “Storage Infrastructure models”.
Threat T4.5.1: Violation of laws or regulations
The management of legal aspects is pertinent to the Big Data system and can, therefore, be considered as a threat to the system itself. In this respect, the GDPR and the Free Flow of Non-Personal Data Regulation, for instance, dictate -among other- how organizations are expected to handle personal data, who is ultimately responsible for the protection of personal data in the context of complex supply chains, what are the associated obligations concerning mixed data sets of both personal and non-personal data and how to mitigate risks (e.g. for profiling).
Assets: All assets.
Threat T4.6.1: Skill shortage
A possible shortage of skilled data scientists and managers is one of the main threats to Big Data.[16] This threat has a strong link to threat group TG4.1 “Unintentional damage/loss of information or IT assets”.
Assets: “Roles”.
Threat T4.6.2: Malicious insider
Insider threats are among the most critical security threats to be faced and can be distinguished in unintentional or malicious insiders. It is quite shared the view that insider attacks may inflict larger damages than outside attackers[17][18][19][20] Their impact is also increasing due to the fact that, on one side, no effective security solutions exist for this threat and, on the other side, the value of data is increasing exponentially. Insiders are in fact authorized users with legitimate access to sensitive/confidential documents, possibly knowing existing vulnerabilities[17]. Malicious insiders have therefore multiple incentives to carry out an attack that ranges from revenge to revenue when sensitive data are at their disposal.
Assets: “Roles”, “Data”, “Infrastructure Security”, “Integrity and Reactive Security”.
The advent of COVID-19 further enhanced threat T4.6.1: Skill shortage. Furthermore, COVID-19 also generated the following new threats:
Threat 4.1.3: Information leakage/sharing due to hostile home network – COVID19
It considers an attacker exploiting the impact of COVID-19 on businesses and people to increase its revenue in terms of information leakage/sharing. In particular, it focuses on the need of people and employees to move their activities to remote and untrusted sites, which are usually weaker that their counterpart at the business side.
Threat 4.2.3: Conversation Eavesdropping/Hijacking – COVID19
It considers the increased risk of conversation eavesdropping and hijacking introduced by the exponential raise of videoconferences, on one side, and the security gaps videoconferencing tools carry.
Threat 4.3.3: Unreliable Data – COVID19
The COVID-19 pandemic put to the extreme the problem of selectively distinguishing between reliable and unreliable information. People have been overloaded by information about pandemics, conflicting opinions by virologist, making nearly impossible to understand the status of the crisis and making society vulnerable. On the technical side, criminals are going beyond the simple data poisoning in T4.3.1 and adapted current cybercrime to fit the pandemic narrative [21], exploiting the uncertainty of the situation and making it even more critical with fake data and research experiments. This scenario is producing a substantial increase in social engineering activities, as well as in the success rate.
[1] WP2018 O.1.2.1 – ENISA Threat Landscape 2018 https://www.enisa.europa.eu/publications/enisa-threat-landscape-report-2018/
[2] Ponemon Institute’s Cost of a Data Breach Report 2019
[3] Baseline Security Recommendations for IoT, https://www.enisa.europa.eu/publications/baseline-security-recommendations-for-iot
[4] PINNING DOWN THE IOT https://fsecurepressglobal.files.wordpress.com/2018/01/f-secure_pinning-down-the-iot.pdf
[5] Information Leakage, http://projects.webappsec.org/w/page/13246936/Information%20Leakage, 2018.
[6] Trend Micro Security Predictions for 2018: Paradigm Shifts https://www.trendmicro.com/vinfo/my/security/news/threat-landscape/2018-trend-micro-security-predictions-paradigm-shifts
[7] Big data creates big opportunities for identity thieves: http://www.c4isrnet.com/story/military-tech/it/2015/01/19/big-data-identity-theft/22004695/
[8] Half of management teams lack awareness about BPC despite increased attacks https://www.helpnetsecurity.com/2018/12/07/business-process-compromise/
[9] Trend Micro Security Predictions for 2018: Paradigm Shifts https://www.trendmicro.com/vinfo/my/security/news/threat-landscape/2018-trend-micro-security-predictions-paradigm-shifts
[10] S. Chai, M. Kim e H. Rao, «Firms’ information security investment decisions: Stock market evidence of investors’ behavior,» Decision Support Systems, vol. 50, n. 4, pp. 651-661, 2011.
[11] J. Sametinger e J. W. Rozenblit, «Security Challenges for Medical Devices,» Communications of the ACM, vol. 58, n. 4, pp. 75-82, 2015.
[12] A. Garg, J. Curtis e H. Halper, «Quantifying the financial impact of IT security breaches,» Information Management & Computer Security, vol. 11, n. 2, pp. 73-84, 2003.
[13] See https://www.enisa.europa.eu/topics/threat-risk-management/threats-and-trends/enisa-threat-landscape/threat-taxonomy/at_download/file
[14] See https://www.justice.gov/usao-ndca/pr/sunnyvale-based-network-security-company-agrees-pay-545000-resolve-false-claims-act
[15] Global Threat Intelligence Report https://www.nttsecurity.com/gtir
[16] See for example reports from McKinsey http://www.mckinsey.com/features/big_data and from the Financial Times http://www.ft.com/cms/s/0/953ff95a-6045-11e4-88d1-00144feabdc0.html#axzz3ntU3lM00
[17] IPACSO Project, «Innovation Framework for ICT Security,» [Online]. Available: https://ipacso.eu/
[18] M. Brzoska, R. Bossong e E. van Um, «Security Economics in the European Context: Implications of the EUSECON Project,» Economics of Security Working Paper Series, vol. 58, 2011.
[19] R. F. Trzeciak. 2017. SEI Cyber Minute: Insider Threats http://resources.sei.cmu.edu/library/asset-view.cfm?assetid=496626
[20] PWC. 2017. Global Economic Crime Survey 2016: US Results. https://www.pwc.com/us/en/forensic-services/economic-crime-survey-us-supplement.html
[21] INTERNET ORGANISED CRIME THREAT ASSESSMENT (IOCTA) 2020, EUROPOL