This tool to study Pornhub's personalization algorithm is now DISCONTINUED.

After our global-call on the 19th of January, we are glad to follow-up and say that:

The test on the PornHub’s algorithm went well enough. More than 100+ supporters showed up. /impact. This is not enough to be representative, but it is a good start to test our process.
We collected 87 correct sequences (see at methodology section our selection logic). We released the software and documented the data format: we’ll repeat the experiment soon.
We produced three versions of the CSV (updates and bugfix) to allow other researchers to replicate the study. Sadly, nobody else played with the datasets, but please don’t hesistate to reaching out to potrex-team@tracking dot exposed.
We keept sharing our updates while the investigation was ongoing, now, we are going to share this final report and some slides on our social media channels.

1. In a few words

Findings and Process

As expected: many little insights and nothing groundbreaking. We don't have any major finding. Potest#1 allowed us to identify some variables playing behind the scene of PornHub and to test our research skills. In short, these results empower us to better understand the Pornhub platform before developing research questions and related tests.

You should consider Potest as part of an open and re-iterative process: we will design proper methodologies to explore a variety of research questions. If you have any suggestion, you can email us at pornhub-team at tracking dot exposed, or reach out in our Mattermost chat.

Table of Findings

When watching a video, the eight related items might be either the same (fixed recommendation) or dynamic. In this test we tested a video published on PH 11 years ago and another one just shared 24 hours before the experiment. With this type of setting, we can state that while old videos get their recommendations frozen, new videos are subject to a more dynamic set of reccomandations; these insights gives us a new research direction.

The PornHub's homepage has five or more sections, but we were able to retrieve just five. Only two are personalized for the user, in fact, priority is given to Most View and Hot Video.

Recommendation doesn't seem to be personalized with our test. We know it should be, we didn't yet isolated a clear evidence.

2. The test

2.1 Experiment’s design

It is complicated to make meaningful inferences using data collected by random people on random videos. We need first to be able to control some variables.
We tested PornHub’s recommended system with profiles under our control. This allowed us to understand the role of all the variables involved in the process. However,this strategy has a considerable limitation of not being able to take into account the variety of users’ profiles.
Therefore, we decided to create the following collaborative observation: we asked random people across the world to repeat the same sequence of actions and then measure how recommended video changes. Here is the announcement we shared the same day, we asked to contributors to install our browser extension (which records what Pornhub decides will appear on your browser, in order to understand how much this process is subject to personalization)
In this report, we call supporter the random stranger on the Internet who installed the browser extension and participate to potest1.

We requested the following steps

Step	Link	Why
1	Homepage.	There are regional sections, and we want to see how much PH changes the homepage during the day.
2	Recommendation page.	To see if PH is recommending something unique for you since the beginning.
3	The first video it's been on Pornhub for 11 years.	We want to collect the 8 related video below each video page.
4	Recommendation page.	To see if the recommendation is change since the first look.
5	Second video published the day before the test.	We want to see the 8 related video below each video page.
6	Homepage.	To record a second homepage sample.

2.2 What we were looking at

We wanted to compare an old video’s visualization with a recent one to see if the related content tends to “freeze in time” or keeps changing. Our hypothesis was that: with an old video PornHub will return the same eight suggested content for every user, and that with a new video PornHub’s recommender system will test a recent sent of best recommended videos and would suggest some of them to the user, changing the set of recomandation quite often.
Before and after the visualization of the videos, we asked our partecipants to visualize the Home and Recommended page. We wanted to highlight in this way the effects of the personalization process. We are aware that the Homepage’s contents change frequently during the day, but we don’t know the underpinning mechanisms. Aneddoctaly, we can assert that videos tend to be ‘hyped’ in some hours and then fade away, like waves. Recommended videos and Homepage share the same pool of ‘hyped’ videos, we wanted to verify if different users across the world share common suggestions.

2.3 How the extraction process worked

The collection process lasted for 24 hours, our extraction method took into account only the complete sequences (if a sequence is composed by 6 steps like this one, all the steps in the exact sequence should have been recorded).
The extraction was done with this nodejs script. Additional notes in the extraction have been documented as announcements.

3. The analysis

Then, we looked at correlations and patterns, to better perform this analysis, we loaded some sections of our CSV in (Gephi), a network analysis tool.

2.3.1 Videos watched

All the users got the same 8 related content when watching the old video. On the other hand, with the second video (uploaded few hours before the test), we have a very different scenario: the suggested videos are different across users, they according to when the video was watched, and they can be clustered in eight topic-related groups.

First video: "lily thai", uploaded 11 years ago.

Here you can see the suggestions recorded for the first video. Each light orange node in a circle is a different participant to the experiment; each orange node is the title of a suggested video. The labels are the titles of the suggested videos. Each video had 8 recommended videos and they were constant for the test duration. Each watcher got exactly the same recommendations.

Second video: "pussy licking", uploaded 1 day before the test.

Throughout the day Pornhub was testing different cluster of recommendations.

Each circle in the most exterior round represent a different user; some circles have a bigger size because the user saw the video twice. The other circles represent various videos' titles. We don't know why some users have different topic-related suggestions, e.g. "pussy licking" (suggested at 12.00 pm), "dildos" (at 15.00pm).

The clusters are correlated to the visualization time.

In the second video the different clusters of suggestions change according to the visualization's time. We are not able to explain which are the underpinning reasons that lead this video to have different reccomandations compared to the previous one. Perhaps, since the video is new the reccomandations have not been 'stabilized' yet. Or, PornHub could attribute a considerable importance to the time of the day in which you are watching a video and, hence, significantly change its reccomandations. Indeed, we would need another test to confirm these just mentioned ideas!

In the gif, you can see the second video's recommendations changing based on the visualization's timestamp.

The animation shows the progression of the suggestion by the time the users watched the video (from 00.00 to 24.00). You might find interesting how a supporter, represented by a larger circle, who performed all the steps twice, at the first visualization (H: 00.05) received the suggestions of the grey cluster, whereas at the second visualization it received the violet’s one (H: 15.00). As we already mentioned, it seems that the user receives different recommendations depending on the hour of the visualization. Therefore, we can point out that suggestions are not only related to the user, but also to the visualization’s time. Note: If you look the dataset, the random pseudonym associated is cheese-cheese-egg, and the size of the node is bigger than the other users’ one because of more contributions.

2.3.2 Home and Recommended page suggestions.

The suggested videos for the first and second access to the Home and Recommended page are almost identical. Again, we cannot identify the reasons that motivates this minor shift for some users. Probably, it is just random testing.

Here you can see the even smaller differences between two access to the Recommended page. Each orange node is a different user; each ochre node is a suggested video's title. The violet nodes are the ones who appear just in the second visualization of the Recommended page.

Before and after the test, suggestions are almost equal.

Here you can see the small differences between two access to the Home page. Each orange node is a different user; each ochre node is a suggested video's title. The violet nodes are the ones who appear just in the second visualization of the Homepage.

2.3.3 Comparing Homepage’s categories

Not all the homepage sections are the same.

In the Homapage, the suggested videos are dislayed under different sections. By comparing different supporters involved in the experiment we found out that:

the first and secondon sections always mention the watcher’s nationality
the third and fourth sections below are explicitly recommended for you (looks like they should be deduced from your interests).
the last section is about recent videos

Before wondering about the logics of the sections’ dynamics, we can at least observe how they vary among watchers. Font size is proportional to the amount of occurrences recorded.

Section 1

total here 248

Section 2

total here 248

Section 3

total here 248

Section 4

total here 248

Section 5

total here 243

Grouping the (homepage) sections

By grouping the sections in three macro sections we noticed that:

Hot and Most View the primary entry point for PH is leveraging on collaborative filtering (content selection because of trending) by regional or global subgroups.
Recommendations in second position (less important, perhaps?) and can be a general ‘Recommended For You’, a portion likely overlapping with the content served in /recommended page, and ‘Recommended For You - [Category Name]’.
Recently Featured: Content suggested because of chronological order (but we ignore the reason for a video to become Featured).

How does the personalization of the sections works?

PornHub stores in localstorage a sequence of watched video by each user.
After a while, a profile with new cookies and tracking code, starts to navigate over a due category, the ‘Recommended for [Category Name]’ becomes more appropriate along with the selected fetish.
PornHub, with the stored list of watched video, can infer a liked fetish and suggest it in the Recommended.
In this potest#1 we didn’t suggest (probably) enough video, and without belonging to a specific category would be hard to see if they influence in any way.

What we didn’t find out, but we’ll keep checking:

We don’t know if any particular producer benefits from any advantageous treatment from the algorithm.
We don’t know if, for not-logged-in users, the recommended page changes accordingly to what has been seen.
We know for sure that the Homepage and ‘Recommended For You’ section, depends on your past activity, but we didn’t yet link this evidence.

4. Other interesting things

Research on pornography leads to recruitment difficulties 🤷

We shared the invitation below on: /r/privacy, /r/italyInformatica, and /r/SampleSize.

As ironic as it can seem, an algorithm (the antispam filter of reddit) punished us as spammers, of course:

it wasn’t the only issue, the first algorithm overlord and true Sauron’s eye, Google played its role as well:

We're working on repeating this test and validate our findings.
We will, by the 25 of March, apply this experience on YouTube!

Completed in February 2020 by Claudio, Giulia, Salvatore, Matteo, and Barbara.

1. In a few words

Findings and Process

Table of Findings

2. The test

2.1 Experiment’s design

We requested the following steps

2.2 What we were looking at

2.3 How the extraction process worked

3. The analysis

2.3.1 Videos watched

First video: "lily thai", uploaded 11 years ago.

Second video: "pussy licking", uploaded 1 day before the test.

Throughout the day Pornhub was testing different cluster of recommendations.

The clusters are correlated to the visualization time.

2.3.2 Home and Recommended page suggestions.

Before and after the test, suggestions are almost equal.

2.3.3 Comparing Homepage’s categories

Not all the homepage sections are the same.

Section 1

Section 2

Section 3

Section 4

Section 5

Grouping the (homepage) sections

How does the personalization of the sections works?

What we didn’t find out, but we’ll keep checking:

4. Other interesting things

Research on pornography leads to recruitment difficulties 🤷

We're working on repeating this test and validate our findings.We will, by the 25 of March, apply this experience on YouTube!

We're working on repeating this test and validate our findings.
We will, by the 25 of March, apply this experience on YouTube!