Dear All,
Happy new year to all of you. I hope you all are doing well and you had great holidays.
Hopefully you are not hidden to hard by the artic temperatures in the US.
In October I told you I'd like to submit a paper of our Group for the Internet of Things World Forum 2014.
I submitted together with Ning an extended abstract and it was accepted. Now I have to make it a real paper.
One of the topics will be privacy
So I discussed with a colleague about "How to address the privacy issue in the IoT"
I'd like to hear your opinion / thoughts about it. Here is what I like to propose:
Governance of data and Privacy
The IoT will lead to an increasing amount of data sources producing a tremendous amount of messages in the network. A single data of a sensor might not affect the privacy of a person. But data could theoretically be combined or related to worthy information about a person, its behavior, its personal situation or any other private information.
For example a truck of a logistic company might be equipped with GPS tracker, velocity sensors and other on-board diagnostic devices. The GPS data allow for tracking all moves of the driver. If these data are public available they can be combined with the home address of the driver. When current GPS data proof that he is far away from home, this might be invaluable information for burglar and other criminals. So the truck driver is interested in hiding data of the GPS tracker. On the other hand these data might be interesting for statistics and optimization of timetables of the logistic company. Also communities and local authorities might be interested in real time traffic information etc. So they have an interest in using these data. In fact driver and e.g. communities have different claims regarding same data.
The claims approach
The example shows that different actors in a scenario might have different claims on data produced by a certain data source. A simplified dataflow in this approach starts with the collection of data in a sensor. That's the data source. The data are transmitted to a data sink. That's a device or an application where the data are finally consumed and evaluated. The data sink is tight related to one of the actors in a scenario. One or more intermediate device might relay the data (e.g. router, gateways, etc.).
In our example we have a data flow from a sensor (GPS tracker) through the Internet to an application or data base in terms of a data sink (the servers or applications of the logistic company, community or a smart device of the driver). The data flow passes several routers and other equipment of the transport network as intermediates.
Now we apply certain operations to the data on their flow depending on the claim of a certain person in a scenario. What does this mean for our example? The truck driver claims secrecy of the GPS data. He just wants to allow himself to see the data. So the flow from the GPS tracker to the smartphone of the driver should be end-to-end encrypted. The logistic company claims visibility of the data. In this case there is no conflict. The solution here is to provide anonym data to the company. Certain data about day and driver might be deleted. Position data and average velocity are still valuable for timetable optimization of the company.
The claims approach has few basic operations to data that are appropriate to satisfy the claims of different actors in a use case. Secrecy could be achieved by end-to-end encryption between the sensor and the data sink. So no intermediary has the ability to read the data. Anonymity could be provided by sorting out a subset of data. Data could be avoided by discard them directly at the sensor.
Now the different claims of different actors in a use-case can be analyzed and an appropriate data operation could be chosen to satisfy the claim.
But how to proceed with conflicting claims? The goal of this approach is to support architects to design configurable privacy enabled IoT infrastructures. The configuration depends on use cases, scenarios, rules, national laws and cultural area. In some countries the most restrictive claim might be enforced. In our example it could be prohibited to track the vehicle of an employee in general. In other countries it could be legal and normal to use GPS tracker in that way and there might be also no obligation to make data anonym. A system could be configured in both ways as long as it is able to apply the necessary operations to the data produced by the system.
The advantage of this approach is that we propose to consider the implementation of basic operation for every data flow in a system and every claim of an actor in scenario. The configuration what to use when should be done by the administrators or users in their specific domain, country and legal environment. So the final decision what data are private or not are not up the technical architects although they enable the system to support all possible configurations.