Project analysis

Researching a method for the critical and independent comprehension of the algorithmic personalization in pornography. The Pornhub case.

Read the complete project analysis .pdf

Project introduction (poTREX) saves the suggested videos to different Pornhub users as evidence, in order to allow us to study and visualize how these change over time, or by geographical area, or per user, as well as the mechanics that make it different compared to other media and social networks. Because PornHub is behaving in a quasi-monopolism fashion, we should acknowledge the large impact it can have in shaping the perception of pornography, in the same way, we feel concerned about YouTube personalization algorithm. The system separates the analysis methodology from the acquisition technology. poTREX is a neutral system with respect to the research being carried out. poTREX is not neutral in absolute terms because a technology which allows the analysis of a closed source algorithm is inherently an empowering technology for users and researchers.

Once the poTREX browser extension is installed, the tool generates a cryptographic key which is necessary to authenticate the evidence. When pornhub is watched, it records on a centralized database the related video and the video watched. The data collected are anonymized (it only keep the cryptographical fingerprint) and give to the user all the rights described by the GDPR. poTREX system consider an equivalent of a personal data the sequence of video watched, therefore only the owner of the cryptographic key in the extension has visibility of this sequence. poTREX consider all the video watched as unlinked evidence, and the primary service is to let compare if the same video got different suggestions. The person running the research has the task of interpreting these differences and thus define the methodology. The methodology will then be the one chosen by those who set up the research, and will depend entirely on the research question.

A useful premise for the creation of the tool, and every subsequent analysis, is that Pornhub has a very particular relationship with advertising. The advertisements that have appeared during the different experiments have been more or less always the same, whatever the type of content the users looks at and whatever their sex, gender and sexual orientation are. There doesn’t seem to be any need for profiling with the purpose of providing targeted advertising. Although in January 2016 the clothing brand Diesel had announced and then launched an advertising campaign on Youporn and Pornhub, on these sites there is still no trace of a constant presence of mainstream advertising, the advertisements by TrafficJunky of phantom products are basically for penis enlargement, viagra, paid live webcam and prostitution sites. For this reason we could speculate that the functioning of the algorithmic personalization of Pornhub follows slightly different motivations compared to those of traditional social media, although always aimed at a tangible economic return in a capitalist economy. The physiological collapse of the post-masturbation attention usually leads to the abbandon of the site, to insist in this sense, try to prevent it like Facebook does, would be a great effort, probably useless.

If on the one hand there is the need to provide videos that appeal to the users and guarantee a simpler and faster search, on the other the company tries to keep users on the site for as long as possible, or at least make sure that a user accesses it more and more often. Does it lead the users towards addiction? Where does individual responsibility begin and where does that of the brand end? If platforms like Pornhub and other social networks were considered as public utility companies, they would have to adhere to strict antitrust rules and guarantee minimum standards of egalitarian and non-discriminatory treatment, in whatever way it is intended.

The same user experience changes a little, for instance, between heterosexual men and women, although the Pornhub Insights report different behaviours, which can be exemplified by a tendency for women to search for videos of cunnilingus as much as the 281% more than men. When registering a new account, the users are asked about their sex and what sexual orientation they have. It should be noted that what seems to be important is only who and what makes the user arouse, but it’s all based on a model of sexuality aimed at pleasing the male: a lesbian woman can choose from a range of options that are the same proposed to a heterosexual man, while a heterosexual woman is among the same proposals for homosexual men.

According to statistics of 2018, derived from Google Analytics (demographics data are limited to two genders – male and female), the female public would be around 30% and constantly increasing, which is why Pornhub has introduced sections/categories such as Popular with Women, stating, perhaps in contrast with feminist directors, that “Porn for women takes many forms, and there’s no single genre that fully defines ‘porn for her’. That’s why we’ve compiled all of the porn videos that are most-watched and most favored by real women. Female-friendly porn isn’t one-size-fits-all here, you find everything from story-driven, passionate softcore porn to hardcore gangbangs. The one thing they all have in common? Real women actually prefer them”. Assuming that there are, therefore, differences in the way of living sexuality, as almost all feminists declare, would this be a way to introduce female sexuality or a way of adapting to the male one and pleasing their desires? Do the bias present in the personalization algorithms and the user experience design discriminate female sexuality? How much do they influence male sexuality as well?

poTREX was created as a tool to answer these and other questions and allow researchers, feminists and anyone interested in them, to analyze site-user-algorithm interactions independently and freely. It can also be a great way to awaken consciences among less informed Internet users. Thinking of someone who observes closely and can predict your behaviour even while watching porn videos can be perceived as something particularly disturbing, given that this activity is considered very personal, taboo and with a high potential for blackmail. In particular, understanding that the incognito mode doesn’t mean you are surfing the net anonymously and that both our browser and Pornhub are observing everything: IP address, cookies, geographic location, time of visit, which hardware and software you are using, which videos are searched, which are opened, for how long they are watched, up to obtaining a unique and identifying fingerprint of the individuals and their digital unique behavior.

Compared to other data collection businesses like Google, Microsoft and Facebook, MindGeek claim to be relatively respectful of user’s privacy. In fact, the company stated to Quartz in the article of 13 December 2018 Porn sites collect more user data than Netflix or Hulu. This is what they do with it. that even if it uses user data to create and recommend pornographic material, it does not sell them to third parties. But this doesn’t mean at all that it will be completely immune to possible problems, given for instance the numerous data breaches that have taken place in the context of pornography.

“Streaming is not just about content distribution, but also about communication,” as UCLA professor at UCLA School of Law and International Institute has told Quartz, “when you stream a video or listen to a song, you’re sending information that can be measured ”, is therefore not a one-way communication, there is interaction.

According to a recent study by Raustiala and Christopher Spigman, a professor at New York University Law School, MindGeek would be particularly avant-garde in the analysis of this type of communications while Netflix and Spotify, for example, would know their users a little less than the pornography giant. Raustiala has compared this knowledge of the user to a spectrum that depends on several factors, where MindGeek would be more advanced in terms of the use of big data in a feedback cycle. This is because MindGeek relies heavily on data-driven authorship and content tailored to viewers, to encourage users to subscribe, they don’t have to worry much about advertising. MindGeek has several companies that analyze data.

Chauntelle Tibbals, sociologist and author of Exposure: Sociologist Explores Sex, Society, and Sex Entertainment, has said that “the fact that people now have access to various forms of sexual expression is a good thing. But people who have access to various forms of sexual expression without context and/or an accurate and pertinent sex education complicates the situation instead”. For this reason, investigating these issues can be valid and beneficial for today’s society and future generations.

In the first phase of poTREX development it is important to understand where and how the Pornhub personalization algorithm works, to identify the metrics, to optimize the tool and then to carry out higher level analyzes, for example about the impact that can have on sexuality.

Similarly to the test done by fbTREX during the Italian elections, it is advisable to use simulated situations in order to bring our actors into a situation of divergence and study what factors are causing it. Once these factors are documented, it will be possible to proceed.

The actors of the experiments were different Firefox profiles, thus ensuring a clean browser without history and cookies, although it is not certain that Pornhub doesn’t use hardware fingerprinting jet. Some Firefox profiles were real bots, that is, a fictitious Pornhub account had been created specifically to simulate their identity, while other profiles acted without any registration; this allowed us to observe the different interactions of the site with the registered user and with the non-registered user.

It is also important to observe how the behavior of Pornhub changes in the absence of previous cookies, or with the incognito mode, which is often used by users.

The situations to be considered were therefore four and the hypothesis was that, with the increase in customization, the difference in content would have increased. In reality, between an unregistered user and one with an account, there is not much difference from the point of view of the videos. When you log in to your account, the customization is always individual, as happens with the unregistered user who surfs standardly. The only time when this doesn’t happen is when the navigation is done in incognito and without access, in this case the suggestions are the same for all the profiles and correspond to the suggestions that appear when you reset the preferences or when you log in for the first time to the site. This is likely to be national advice. A more personalized experience would then be obtained by accessing the site with your account, or avoiding the unknown mode; this incentive would allow Pornhub to collect more ongoing data about the user and profile it with greater accuracy.

Context matrix

The videos are typed according to a hundred categories recognized by Pornhub, as well as an innumerable series of tags that are added by the uploader. Obviously a video can belong to several categories and tags. The categories can indicate the physical appearance of the performers or the ethnicity, sexual fantasies, sexual practices that are played out, or the audiovisual characteristics. Pornhub then observes the user, while a user expresses his preferences, like the Greek mythology giant Argo Panoptes, with a hundred eyes-categories, in front of which it’s impossible to hide. Particular case the Gay Only section, which includes some unique categories like Daddy, Military, Twink and others, which in the visualization were not considered as subsets of the Gay category. Similarly, Popular With Women, Verified and VR have not been included, because Pornhub also identifies them as a separate block, still a middle ground between a section and a category. There are many more substantial categories than others, which represent mainstream porn and which largely relate to the physical characteristics of generally female performers.

Categories taxonomy

Categories by number

By clicking on any category, ten other categories appear, that are frequently combined with the selected one. By collecting suggestions for each category, it is possible to create a network graph. What emerges from the graph, created with the network management software Gephi, is that there are categories that, although they do not include a large number of videos, are much more suggested than the others. This can confirm or deny the official statistics of Pornhub Insights. The Japanese category, for example, includes about thirty thousand videos, but it is among the most suggested, as well as the most searched according to official statistics. So does this graph influence statistics or does it simply describe them?

Network Graph

What is certain is that it doesn’t represent the actual links between the categories, it doesn’t indicate which categories are most associated within the videos themselves. Japanese cannot be the category that appears the most where it appears as suggested. What is advised would therefore depend on the most frequently ones researched together. Japanese is also the second most popular category of 2018, as well as being preferred by men, but it must also be said that Japan is the fourth country in terms of traffic. Since there are algorithm and population variables, it is difficult to overturn the algorithm and it is fundamental to simulate circumstances that serve to standardize access, make them comparable, since they can in a certain way neutralize the population variable. They can also recreate normally marginal, subordinate and non-dominant situations, giving space to what feminist philosopher Sandra Harding calls strong objectivity, or understanding how knowledge is constructed by taking into account the biases of researchers, which cannot be considered neutral, since they are not exempt from having their own perspective of beliefs, values and expectations.

The homepage is the main page of the site and the videos it contains are divided into five sections: Hot Porn Videos in [country], Most Viewed Videos in [country], Recommended For You, Recommended Category For You - [category] and Recently Featured XXX Videos. We know from company statements that hot videos, which appear first on the homepage, are calculated considering the last viewed by geographically close users immediately before leaving the site. Even if it is the first time that you access the site, the two sections that contain personalized content already have videos. Starting from the Recommended For You section, clicking on More Videos will take you to the page, where the company declares “Recommended videos are based on your browsing history and/or popular videos from your location” and there is the possibility to reset the recommendations, or even disable them. This page also contains the Taste Profile, a sort of questionnaire that allows you to explicitly express your preferences, but to save them for future visits you must be registered. Still for registered users there is the possibility of evaluating videos in a positive or negative way. Another function reserved for those who have an account is to comment. How are the contents of these pages customized? An experiment in the initial phase, to test whether the video proposals placed under a video were also personalized, made it possible to notice that were the last videos seen before leaving the site to have a particular weight in the personalization. Hot Porn Videos are nothing else but this, on a national basis, assuming that geographical proximity also defines individual tastes. The last video would in fact be the one that has most satisfied the user and therefore, according to the company, the most exciting one. Previously viewed videos and user-searched terms seem to have less relevance, but determine the recommended categories in Recommended Category For You - [category]. In fact, bot A watched five videos for three searches: Masturbation, Solo Female and Webcam. Later it watched a video with no previous links, belonging to the Interracial category, and immediately closed the browser. The following day, out of 21 recommended videos on desktops on the first page of /recommended page, 19 were from the same production company of the last video. The experiment showed that there may be inconsistencies between recommended videos and categories. While the bot A presented the recommended categories consistent with its searches, the recommended videos were more similar to what it saw before leaving the browser. It would be curious to question sexologists and psychologists to understand what this could mean and how it could be experienced by the user. Hence the idea to observe more specifically the videos recommended on this page.

Still taking into consideration the /recommended page, two bots were compared: one that watched videos belonging to the Japanese category and the second to Transgender. The two categories were chosen because they have almost the same amount of videos, but one category is more suggested than the other, as can be seen in the network graph, and would therefore have a different weight for Pornhub. Consequently it should have a different weight for those who read these suggestions and the official statistics too. What we tried to grasp is how and when the site re-proposes the categories examined.

Research protocol

Radar Graph

The Kiviat diagram, or spider web graph, has been chosen for the immediacy with which values can be compared for different observations and personalities. Even if this type of graph represents the outermost parts in a slightly disproportionate way, for poTREX’s objectives, to study and compare the personalization, it turns out to be suitable. Each radius represents one of the categories that appeared among the 21 videos on the first page of /recommended. The distance from the center of the point marked on the radius is given by the percentage of the times in which the categories appear. The points on the rays are joined with segments, so that the graph takes on a personal shape. This shape can tell us something about user profiling and which are the categories that most connected to each other. What we can observe is that the personalization algorithm works, it’s implemented, and, under the same procedure, it seems to work roughly in the same way for both profiles. The two categories sought are by far the most pushed, appearing in almost 100% of the videos and represent about 20% of the categories present, which are those that are somehow associated with the first ones more frequently within the videos. The categories that appear together with those of departure are those present in the videos proposed. What would be singular to investigate and observe is undoubtedly how the suggestions are differentiated by increasing the categories that are sought, to try to better understand the mechanisms of personalization, but also to try to understand if it is influenced by how and for how long videos are watched. The HD Porn category, which is the one with most videos, turns out to be a false friend, since it could communicate something about the quality of the videos in different categories, but on the other hand it is always present in large numbers for each profile, representing a distortion. Another difficulty would probably be comparing users who have used the site in different languages, so it will be necessary to build a dictionary; the comparison now will be possible only between users with the same language.

The role of the spider web graphic could be to represent what Pornhub thinks the user should see, directly on the site and in real time, as well as allow comparison with other users. During the experiments, new research questions and new hypotheses about the exact functioning of the Pornhub personalization algorithm have emerged, which should be highlighted during the development and launch of the extension.

Finally, in conclusion, poTREX is a browser extension for Firefox and Chrome that automatically records the HTML code Pornhub is sending to the users. After installing the add-on, an icon with the poTREX logo will appear in the toolbar on the right, and clicking on this is a quick way to access that extension’s settings. Only while navigating on the Pornhub website the add-on will save the data, over which the users own total control: they can change the identifier whenever they want or delete specific content. Security measures are good enough in this project phase: we do not collect data for surveilling, study, or profile individuals. Our data processing goal is to enable people to do their algorithm analysis. A “supporter” is each browser where the potrex extension its operating. If there are different people, logged profiles, within the same browser, the extension will treat all of them together. If two browsers (Firefox and Chrome) have both the extension installed and the Pornhub profile accessed is the same. Still, we consider logically separated the two collections, as they are two different users.

A pseudo-random sequence of three foods will identify every browser. For instance, a browser can be called guacamole-pizza-mascarpone, and the user will be the only one knowing this and the only one who can change or reveal this pseudonym.

In this alpha stage, the data collection is considered safe. The adopter has control over their data. Each browser extension generates a secret cryptographic material that is necessary to reach a secret URL, from where data can be administered (deleted, downloaded, tagged). The data collected is meant to be at the service of the collector. poTREX developed alongside with ytTREX,, and they share most of their internal logic, constraint, and scope. The website offers a small set of visualization so far. If the adopter wants to realize more professional analysis and research, they can download the CSV or their recorded evidence.

Read the complete project analysis .pdf