final report - poTEST#1

January 2020: the first coordinated observation of PornHub's algorithm: findings and how to let you reproduce the experiment

After our global-call on the 19th of January, we are glad to follow-up and say that:

1. In a few words

Findings and Process

As expected: many little insights and nothing groundbreaking. We don't have any major finding. Potest#1 allowed us to identify some variables playing behind the scene of PornHub and to test our research skills. In short, these results empower us to better understand the Pornhub platform before developing research questions and related tests.

You should consider Potest as part of an open and re-iterative process: we will design proper methodologies to explore a variety of research questions. If you have any suggestion, you can email us at pornhub-team at tracking dot exposed, or reach out in our Mattermost chat.

Table of Findings

  • When watching a video, the eight related items might be either the same (fixed recommendation) or dynamic. In this test we tested a video published on PH 11 years ago and another one just shared 24 hours before the experiment. With this type of setting, we can state that while old videos get their recommendations frozen, new videos are subject to a more dynamic set of reccomandations; these insights gives us a new research direction.

  • The PornHub's homepage has five or more sections, but we were able to retrieve just five. Only two are personalized for the user, in fact, priority is given to Most View and Hot Video.

  • Recommendation doesn't seem to be personalized with our test. We know it should be, we didn't yet isolated a clear evidence.
  • 2. The test

    2.1 Experiment’s design

    We requested the following steps

    Step Link Why
    1 Homepage. There are regional sections, and we want to see how much PH changes the homepage during the day.
    2 Recommendation page. To see if PH is recommending something unique for you since the beginning.
    3 The first video it's been on Pornhub for 11 years. We want to collect the 8 related video below each video page.
    4 Recommendation page. To see if the recommendation is change since the first look.
    5 Second video published the day before the test. We want to see the 8 related video below each video page.
    6 Homepage. To record a second homepage sample.

    2.2 What we were looking at

    2.3 How the extraction process worked

    3. The analysis

    Then, we looked at correlations and patterns, to better perform this analysis, we loaded some sections of our CSV in (Gephi), a network analysis tool.

    2.3.1 Videos watched

    All the users got the same 8 related content when watching the old video. On the other hand, with the second video (uploaded few hours before the test), we have a very different scenario: the suggested videos are different across users, they according to when the video was watched, and they can be clustered in eight topic-related groups.

    First video: "lily thai", uploaded 11 years ago.

    Here you can see the suggestions recorded for the first video. Each light orange node in a circle is a different participant to the experiment; each orange node is the title of a suggested video. The labels are the titles of the suggested videos. Each video had 8 recommended videos and they were constant for the test duration. Each watcher got exactly the same recommendations.

    Second video: "pussy licking", uploaded 1 day before the test.

    Throughout the day Pornhub was testing different cluster of recommendations.

    Each circle in the most exterior round represent a different user; some circles have a bigger size because the user saw the video twice. The other circles represent various videos' titles. We don't know why some users have different topic-related suggestions, e.g. "pussy licking" (suggested at 12.00 pm), "dildos" (at 15.00pm).

    The clusters are correlated to the visualization time.

    In the second video the different clusters of suggestions change according to the visualization's time. We are not able to explain which are the underpinning reasons that lead this video to have different reccomandations compared to the previous one. Perhaps, since the video is new the reccomandations have not been 'stabilized' yet. Or, PornHub could attribute a considerable importance to the time of the day in which you are watching a video and, hence, significantly change its reccomandations. Indeed, we would need another test to confirm these just mentioned ideas!

    In the gif, you can see the second video's recommendations changing based on the visualization's timestamp.

    The animation shows the progression of the suggestion by the time the users watched the video (from 00.00 to 24.00). You might find interesting how a supporter, represented by a larger circle, who performed all the steps twice, at the first visualization (H: 00.05) received the suggestions of the grey cluster, whereas at the second visualization it received the violet’s one (H: 15.00). As we already mentioned, it seems that the user receives different recommendations depending on the hour of the visualization. Therefore, we can point out that suggestions are not only related to the user, but also to the visualization’s time. Note: If you look the dataset, the random pseudonym associated is cheese-cheese-egg, and the size of the node is bigger than the other users’ one because of more contributions.

    The suggested videos for the first and second access to the Home and Recommended page are almost identical. Again, we cannot identify the reasons that motivates this minor shift for some users. Probably, it is just random testing.

    Here you can see the even smaller differences between two access to the Recommended page. Each orange node is a different user; each ochre node is a suggested video's title. The violet nodes are the ones who appear just in the second visualization of the Recommended page.

    Before and after the test, suggestions are almost equal.

    Here you can see the small differences between two access to the Home page. Each orange node is a different user; each ochre node is a suggested video's title. The violet nodes are the ones who appear just in the second visualization of the Homepage.

    2.3.3 Comparing Homepage’s categories

    Not all the homepage sections are the same.

    In the Homapage, the suggested videos are dislayed under different sections. By comparing different supporters involved in the experiment we found out that:

    Before wondering about the logics of the sections’ dynamics, we can at least observe how they vary among watchers. Font size is proportional to the amount of occurrences recorded.

    Section 1

    Section 2

    Section 3

    Section 4

    Section 5

    Grouping the (homepage) sections

    By grouping the sections in three macro sections we noticed that:

    1. Hot and Most View the primary entry point for PH is leveraging on collaborative filtering (content selection because of trending) by regional or global subgroups.
    2. Recommendations in second position (less important, perhaps?) and can be a general ‘Recommended For You’, a portion likely overlapping with the content served in /recommended page, and ‘Recommended For You - [Category Name]’.
    3. Recently Featured: Content suggested because of chronological order (but we ignore the reason for a video to become Featured).

    How does the personalization of the sections works?

    What we didn’t find out, but we’ll keep checking:

    4. Other interesting things

    Research on pornography leads to recruitment difficulties 🤷

    We shared the invitation below on: /r/privacy, /r/italyInformatica, and /r/SampleSize.

    As ironic as it can seem, an algorithm (the antispam filter of reddit) punished us as spammers, of course:

    it wasn’t the only issue, the first algorithm overlord and true Sauron’s eye, Google played its role as well:

    We're working on repeating this test and validate our findings.
    We will, by the 25 of March, apply this experience on YouTube!

    Completed in February 2020 by Claudio, Giulia, Salvatore, Matteo, and Barbara.