In testing GDPR's "Right of Access", a reporter received 138 GB of raw personal data from Apple, Amazon, Facebook and Google, most of which were very difficult to analyze. .


If the many technological scandals of recent years have taught us anything, it is that technology companies hold a truly terrifying amount of data on us all. In addition to being invasive, this data can be downright dangerous if it falls into the wrong hands.

Europe's response to this risk, put in place under the General Data Protection Regulation (GDPR), is the "right of access". This right states that, upon request, any company must be willing to provide you with your personal data. They should provide you with an easy-to-read, timely, and comprehensive enough way for you to understand how they got it and how they use it. The idea is that once you know what data a company holds about you, you can use it to make informed decisions about whether you want to provide them, as well as to hold them. responsible when they collect data without your consent.

Data from Apple, Amazon, Facebook and Google has been downloaded and reviewed

The problem is that businesses can often be very stingy about providing this data. After all, if your service essentially consists of "forcing your consent" (as Google has recently been fined 50 million euros), you may not want your users to easily see the number of personal data you collect.

I decided to test the "right of access" proposed by four of the largest technology companies operating in the European Union: Apple, Amazon, Facebook and Google. What I've found suggests that if you can get the raw data, understanding it is another problem, which makes it difficult to make informed decisions about your data.

According to the UK's data protection regulator, the OIC, companies must provide, upon request, all personal data – defined as data relating to an identified or identifiable person. The information must be provided to the person in a form that is "concise, transparent, intelligible and easily accessible, in clear and plain language" in a "commonly used electronic format." This sounds simple enough, but how each of the four giants of does technology have to do?

It was easy to download my data in the first place. Google and Apple's data download services let you choose the data you want to download. This is not the case for Facebook, but the three sites are easy to find on their respective websites, and it happens quickly. In the meantime, rather than presenting it as an easy option to find on its site, getting a unique link with all of your Amazon data is based on the fact that you search the "Contact Us" page of the site to find the right one. hidden option at the end of the list. Once I asked, it took 30 days to receive a link to download my data (the limit imposed by the regulations).

Google's location data was particularly difficult to understand.

When the time came to review the data I had received, however, things got complicated. Some files were ambiguously labeled, while others were stored in formats that tested the boundaries of what is "commonly used". In fact, determining the data I was looking at was not as simple as it should be.

Google's location data was particularly difficult to understand. The company has been repeatedly criticized for tracking Android users even after disabling the main location tracking option in the operating system. Consumer groups in seven European countries have lodged complaints with their data security observers. Downloading your data with the help of GDPR should be a way to check that a service does not use such tricks to gather more data than it should be. It should be a way to keep companies like Google in the account.

Google admitted that it was following you even if you turned off the location's history.Photo by Chris Welch / Grouvy Today

But when you look at the data, it's very difficult to visualize and understand that information. All of Google's location data was contained in a single 61MB JSON file. Opening it with Chrome revealed an impressive number of fields titled "timestampMs", "latitudeE7", "logitudeE7" and estimates of whether I was sitting or no. a sort of transport (I guess).

I have no doubt that it is all the location history information that Google has associated with my account, but without context, these data do not make sense. It's a series of numbers that I should make a serious effort to even begin to understand and import into another software in order to properly analyze. If the goal of GDPR is to empower users to have more control and better understand what data is collected from them, this part of Google's download has little to offer. JSONs are great if you want to ingest data into another system, but they are less useful if you want to gauge how much data Google has about you and make informed data privacy decisions.

Google should do more to explain the nature of this data.

When it came to other files, it was not even clear what data I was looking at. A 4 GB HTML file titled "My Activity" located in the "Ads" folder is showing me something about the ad tracking data Google has collected about me, but there is no such thing Annotations or metadata to explain it.

These are, by far, the most confusing files of complete data download, and they are also the most important. They contain the kind of personal information that potential advertisers would kill for, and Google should strive more to explain what they are. It already provides an HTML index file to give you an overview of your data, so why not include information about the contents of each file?

Apple was more successful than Google in presenting its data, even though there were still problems. First impressions, however, were very positive. Most of the data provided by Apple was in file types that were easy to read and understand, such as CSV, TXT, and JPG, with only a few JSON files that could complicate things.

But once in these files, there is still a lot of information difficult to understand. A file titled "Apple ID Account Information" seemed to contain 11 almost identical records regarding my Apple account, all created exactly the same date in 2014, with no explanation of their nature. Another CSV file with the ambiguous title "Apps and Service Analytics" seems to contain a complete list of each of my searches in the App Store, but it contains so many empty cells that I noticed that it contained data when I saw its contents. 6.7MB file size.

Ironically, Facebook had the most understandable data of the four services

Despite the modesty of being able to listen to all my requests Alexa, Amazon has done better to present his data, although this may be due to the relative weakness of my opinion about me. For the most part, the files and folders were clearly labeled, although the company still has work to do to better label the contents of its spreadsheets.

Ironically, Facebook had the most understandable data of the four services. For starters, every file that Facebook gives you is an HTML file. Each is sorted in its own clearly labeled folder, and an index file gives you a preview of what each document contains. The files themselves are clearly arranged and formatted, and their browsing gives the impression of browsing a page on Facebook itself, even if it is stored entirely locally on your computer.

The Facebook download includes a long index file that tells you where to find all your information.

It's always terrifying to see the amount of data that Facebook has stored on you (and it does not even fit into the instances of people who have found records of all their old calls and SMS messages), but at least you're well aware of what this information is, rather than having to guess based on the contents of each file.

At the end of my experience, I have a little less than 138 GB of data on the four services I contacted. I had 1.1 GB of Facebook, 392 MB of Amazon and 254 MB of Apple. Although Google is downloading 72.5 GB of data in bulk, it's mostly my Google Drive and Google Photos backups, which reach 44.3 and 25.7 GB, respectively. The rest of my Google data has arrived at just 2.5 GB.

After trying to understand everything and understand everything, it is clear that these companies, as well as the GDPR regulations that govern them, still have a long way to go if they want to give us real control over our data. Being able to download it is one thing, but making it useful means working harder so that what is downloaded is easier for everyone to understand.

At a minimum, it means providing a better index to tell you what data is contained in which file, but it also means organizing the contents of those files in ways that give them better meaning on their own.