Ernesto Damiani @ 13th Extended Semantic Web Conference (ESWC)
In many Big Data environments, information is made available as huge data streams, collected and analyzed at different locations, asynchronously and under the responsibility of different authorities. It has become common for data analysts to have a mandate for computing Big Data analytics without holding the rights to access the individual data points in the input, as they may contain sensitive information or personal data protected by privacy regulations. This talk discusses the idea that techniques used for semantic enrichment of Big Data (such as semantic lifting to harmonize metadata representation across data collection points and pre-joins at data ingestion time to avoid computing semantic joins on Big Data storage) can be seen as non-linear leakage and privacy risk boosters. Intuition suggests that semantic techniques applied to Big Data representation may have a double impact on security risks: (1) increase leakage risk by increasing the value for the attacker per unit of information leaked (2) increase intrusion risk, making injection attacks (i.e. attacks aimed at poisoning data for subverting the outcome of analytics) more effective per unit of poisoned information injected . However, no clear methodology is currently available for quantifying the impact of these boosters.
The video of the talk is available here.