This is the final report, this the presentation summary. Follow the second collaborative observation!
Final complete and clean dataset. Few errors were present in v2, this CSV v3 guarantee 100% presence of author and video name.
TL;DR: first analysis: on tableau you can download the file in CSV v2 or JSON v2. Below some primary findings. We answered to few basic research questions, and the visualization below might help to get what the dataset has.
It might be due because recommended works only for logged profiles, or the nature of the old video, or because one video only was not enough.
new contributor input: processing 6 evidence [5 minutes] +4ms
scripts:potest-1-generator + 0 (partial 2) counter left at 2 session 1 session size: 6 +1ms
Above, zero valid session saved. in 6 evidences, che contributor did not comply with the requested sequence.
new contributor input: processing 25 evidence [9 minutes] +10ms
scripts:potest-1-generator + 2 (partial 1) counter left at 1 session 3 session size: 25 +10ms
Above, a contributor provided 25 evidences. we extract two sessions, and the “9 minutes” means the contributor had the first evidence begin 9 minutes before the last one.
new contributor input: processing 7 evidence [5 minutes] +4ms
scripts:potest-1-generator + 1 (partial 0) counter left at 0 session 2 session size: 7 +3ms
Above, the most commonly observed behavior: a contributor providing one session of observation, and only that.
Unfortunately, part of the tracking.exposed team has been busy travelling and developing some code. So, this means, that we don’t have yet the CSV to share, but the following is an initial breakdown. As introduction to our system, based on MongoDB, the data collections we have is structured as follows:
Gross entries count (which might also be collected when the test was over, and we have to decide if consider them or not):
> db.getCollection('htmls').count({href: "https://www.pornhub.com/", savingTime: { $gte: new Date("2020-01-19") }})
1027
> db.getCollection('htmls').count({href: "https://www.pornhub.com/recommended/", savingTime: { $gte: new Date("2020-01-19") }})
1031
The html saved are heavily duplicated: an observation might send up to 5 time the same html, which will contribute to the same metadataId. This is why we’ve to look at the unique number of publicKey (which is the user identifier)
db.getCollection('htmls').distinct('publicKey', {href: "https://www.pornhub.com/recommended", savingTime: { $gte: new Date("2020-01-19") }})
89 elements. Maybe not all the participants followed the script correctly, and perhaps our parser did not work in all the possible scenarios, but as a generic indication, we gathered 89 individual observations in our first test.
Is it bad or good? It is, of course, good news. It is representative? Of course it isn’t. Thankfully, it doesn’t matter yet since our first goal is to test out team, the tool, and start to measure the divergence between similar PH experiences.
The recommended parser is supported now! each video suggested will be extracted with the following dataformat:
{
"order": 2,
"duration": "35:55",
"publicationRelative": "2 months ago",
"views": "4.3M",
"viewString": "4.3M views",
"title": null,
"href": "/view_video.php?viewkey=ph5d6b197f4129b",
"videoId": "ph5d6b197f4129b",
"authorLink": "/model/nyna-ferragni",
"authorName": "Nyna Ferragni"
"thumnail": "https://ci.phncdn.com/videos/201912/16/269034501/thumbs_10/(m=eafTGgaaaa)(mh=oEChYH3QEwc4Iyfh)8.jpg"
}
The above json object is the second video snippet from a recommended page, now we will produce and share the final CSV as part of this experiment.
…In regards of the Chrome extension, it seems definitely blocked at the moment:
whatever, just a reason to consider firefox better.
Despite Google didn’t enable our extention, a bunch of Firefox adopters supported our project:
This test will work only with our firefox extension, because Google has put our extension under revision, and therefore it is not accessible to the public:
On Sunday, January 19th, 2020: join the first collective observation of the #pornhub algorithm!
We are the tracking.exposed team and our main objective is to put a spotlight on users’ tracking, profiling, and the wider data market by performing an open algorithmic analysis. We believe that, as long as the operation of recommendation systems remains obscure, the many side-effects of platform economy cannot be tackled as they should.
After the development of the infrastructure and analysis tools facebook.tracking.exposed and youtube.tracking.exposed, we decided to take inspiration from the 34th rule of the internet: There is porn of it. No exceptions.
So, are now also focusing on online porn giant PornHub, trying to unpack the hidden logic of user profiling!
We do it because personalization algorithms have the potential to shape public perception, and Pornhub claims to implement some kind of personalization!
To do this, we need your help! Join our global test and explore for yourself how the Pornhub experience varies between users that are performing the same actions. You have to follow an 8 step script. (hard work for a Sunday!) After the test is completed and the evidence analyzed, we will release the dataset, along with some research and a final report, around January 30th, 2020.
You can find our browser extension on the pornhub.tracking.exposed website: it is necessary to participate, and collects what Pornhub send to you. Partecipation is anonymous, the extension can be used in Incognito/Private mode if you allow it, and for extra safety, you can even do the test with a browser you normally do not use (the extension runs in both Firefox and Chrome).
The test is simple — just follow the script you’ll find at this address: https://pornhub.tracking.exposed/potest/1. All you have to do is to click on a few links, and we’ll do the rest: by comparing multiple observations coming from our participants, we’ll measure how frequently personalized changes happen.
The test is not safe for work! We kindly ask you to watch the two videos that we selected after months of exhausting research, but if you don’t like them, don’t worry! Twenty seconds are enough for the extension to collect data and send it back to us. Of course, you retain full control of the evidence you send to us: by clicking on the extension icon, you can open your personal page, where you can eventually delete everything you sent us or play with our basic data analysis functionality.
Thank you, The pornhub.tracking.exposed team