Monday, July 14, 2014

Viral Propagation Models for Apps and Social Software

Today’s post is a follow-up from my previous text on software ecosystems : I will focus on the virality of social applications, that is, the ability for applications to grow their customer bases through social networks. This post is more technical than most, because it is unfortunately necessary, but I will try to keep everything “as simple as possible, but no simpler” :). 

Social propagation of application is desirable because the fight to survive on the smartphone is quite tough. Not only do most people download only a few tens of apps, (statistics varies according to sources; however, the story is the same) but most of them are never used. 80 to 90% of downloaded apps are used only once then discarded. Becoming one of the few app that stays in the “smartphone top of mind” is very hand (i.e., active app), and being a collected app (installed for future use) seems to be very precarious. This is why the route of the web application (smart responsive HTML5 page with embedded bells & whistles) that is accessed through all the classical Web paths (search, links, etc.) is looking more and more interesting for many companies.

We may categorize the social behavior of apps into three categories:
  • Solo apps: applications whose main goal is to be used on your own, even if the score (of a game) may be shared eventually.
  • Communication apps: applications which are used to synchronously communicate with other people. The value of the service grows with the number of correspondents that may be reached.
  • Social apps: application which use asynchronous communication to become content publishing platforms. The distinction between “communication & social” will become clearer later on, but we may state right now that the value depends on the amount of available content, which depends on the total amount of time spent by social partners on the social app.

Not surprisingly, we know that solo apps appeal more to men than women. What I want to look at is the ability for social software (a larger category than apps) to propagate itself through social use and recommendation.

1.      Metcalfe Law for Communication Software

If we consider a simple communication tool (such as instant messaging), its customer base defines a communication network which values grows as the square of the number of users (O(N^2)), according to Metcalfe’s Law. Metcalfe's Law states that the value of a communication network grows as the number of possible pairs of connected users.

The value for one individual is linear in the number of user (O(N)), but both the total value and the virality is quadratic.  The virality, which is linked to the growth rate, may be seen as the product of the “infected” population (number of users) and the probability for one customer to “infect” another person (that is, recommend the service), which is liked to her or his satisfaction (hence, to the value).

One may notice that this is already quite different from an epidemiology model, since the probability of transmitting the disease does not only depend of being infected, but the number of your infected friends.

There are two points which are usually debated in this reasoning. The first remark states that we do not benefit from a very large network of possible contacts, since the number of meaningful correspondents is usually bounded (whether by Dunbar’snumber or any other). 

The second idea is that all correspondents are not equals and that the communication time distribution usually follows a law similar to Zipf’s Law. This leads to the result that the value grows in a O(N logN) fashion. The whole issue boils to the question of knowing is the distribution of the communication tool among your possible correspondents is homogeneous (randomly distributed) or not. This is actually a debate about strong ties versus weak ties, one of my favorite topic. If the communication tool is used to communicate with your close friends, then the propagation model follows the strong ties social graph and we may assume that the value for each customer grows in O(log N) because  of Zipf’s law. On the other hand, if the communication tool is used to reach a larger set of people, then the probability of one of these contact to be equipped with the same communication tool is roughly linear with respect to the usage rate, hence the individual value grown in O(N).

2.      Social Software and Cumulative Valuation of Time

We now consider an application, like Facebook, that acts as an asynchronous content publishing platform. The key observation is that the value of a Facebook session does not depend on how many friends you have, but on how frequently they visit and contribute.
People have different profiles when it comes to reading and contributing on social platforms. However, it is plausible to assume that (a) the read/write ratio is different for each individual but remains rather stable over time (b) the amount of messages read and written is proportional to time spent on the social platform.  Similarly, the attractiveness (i.e., interest to others) of content varies significantly from one user to another, but we may assume that the interest varies linearly with the amount of messages that are exchanged (this is clearly wrong for “newsworthy events” but seems to be true for the vast majority of exchanges that happen on Facebook).

This leads to a recursive system of complexity equations (written in a rather informal style) :
  •  Total Value = N x Average Value
  •  Average Value = Average Degree  x O(Average Time Spent) x Filtering Factor
  • Average Time Spent = O(Average Value)
The only way to make this equation balanced is to assume that the asymptotic behavior of the “Filtering Factor” is O(1/D) (which makes sense, there is only so much that you can read). So if the average degrees grows, some filtering is necessary. For instance, Facebook relates that, every day, it has to choose between 1500 messages what to display to each user. This the role of the “Edge Rank” filtering algorithm, a topic which I have discussed in a previous post.

Once the role of “filtering” is understood, we are left with a “self-fulfilling” set of circular equations that tells us that the value is proportional to the average time spent, which is proportional to the perceived value. It may be thought of as a disappointing tautology, but it says that similar social platform may indeed know very different fates.
At this time we can state two things:
  1. The formula that describes the value obtained by a social app user is complex, hence the virality percolation model is complex. It does not compare at all with an epidemiology model since the probability of “infecting” someone depends both on (a) the number of your infected friends (b) how deeply infected they are.
  2. There is not simple model for understanding the spread of social network platforms : there may exist multiple solutions with similar customer bases (N). The example of Google Plus and Facebook springs to mind: They have both large customer bases (1230 Millions monthly active users for Facebook and 300 Millions montly active users for Google Plus) and average time spend stats which are totally different (8 hours per month for Facebook versus 7 minutes for Google Plus). Nothing in the percolation models tells if Google Plus should grow closer to FB in the future, it all depends on much finer details (value provided to the user per unit of time and per unit of meaningful social content). The non-linear nature of the equation (re-entering loop) means that a tiny difference in this value-creation function may lead to a radical difference in customer usage (i.e., the presentation difference produces different time allocation patterns that, in turn, amplify the perceived value difference).

Notice that usage and subscription are two very different things, with different percolation models. Subscription is much closer to an epidemiological model (modulo the observations that we made earlier), and it is both easier to predict and to favor viral adoption.

3.      Why Facebook’s Doom Cannot Be Predicted with Epidemiological Models

Early this year there was a lot of excitement about a paper that predicted that Facebook would almost disappear before 2017. This information was printed and commented in many famous news sites and newspapers.  The origin for this information is an "archive" (i.e., submitted for publication) paper from two Princeton PhD students, John Cannarella and Joshua Spechler.

Facebook replied with a humorous answer where they use different buggy-but-convincing statistics charts to show the future decline of Princeton and breathing air. They conclude that “We don’t really think Princeton or the world’s air supply is going anywhere soon. We love Princeton (and air). As data scientists, we wanted to give a fun reminder that not all research is created equal – and some methods of analysis lead to pretty crazy conclusions. »

I actually downloaded and read the article, which is very simple and straightforward. It looks at how social networks percolation may be modelled with an epidemiology model (which is clearly wrong, as we showed in the previous section). On the one hand, the paper is “technically correct” : it simply says, what would happen if Facebook’s usage behaved like a the spread of a disease ? What is incorrect is all the newspapers that drew the wrong conclusion. On the other hand, it is of no value since it is very clear that the model does not fit the problem. The fact that the authors were able to tweak the virology parameters so that the first phase of Facebook growth matched historical data is irrelevant. There are many percolation models that would give a similar “S-curve” phase of growth. I laughed at Facebook’s debunk of the article (the fact that is it quoted as viral / epidemiology research article from two PhD Students from the Mechanical and Aerospace Engineering department should have raised some suspicions), but the debunk misses the point : it is not poor data science, it is poor science to begin with.  If you look at the illustration, you will see that the « input data » used for the epidemiology model is the number of « Facebook searches », which means that the decline may also be interpreted as the complete domination of Facebook !

 4.      Percolation Models for Social Software are Unstable

The previous “model” of section 2 is crude because it does not introduce the connection frequency. To understand and to model the behavior of a social app user, one need both the average frequency and the average time spent per users (20 mins for an average Facebook session and slightly more than once a day). I have tried to build a computational model two years ago, and failed because I did not have enough connection frequency data. This means that I could have used my model to predict almost any possible outcome … somehow like the Princeton computational experiment.

From a system science perspective, the “re-entrant” characteristic of the “time spent” parameter in the value equation means that any model is bound to be quite unstable and very sensitive to other dimensions (see the conclusion). One could point out that, as a consequence, the outcome proposed by John Cannarella  and Joshua Spechler is not impossible :). Let us look at a possible “Facebook displacement” scenario (since users seem to enjoy the time they spend on Facebook, it is logical to assume that such a scenario is the outcome of the introduction of a newer, better platform). It makes perfect sense to illustrate this with the rise of Whatsapp (considering the money spent by Facebook to acquire them, someone else must have thought that there was a real threat). The scenario breaks into four steps:

  1.  A new app appears, that is more efficient for a new group of users (most likely, an aged-based group, but not necessarily, it may be a matter of geography or culture). WhatsApp is a great example since it has reached 500 M users in record time.  
  2. Because the app is significantly better (from the point of view of new users), it eats away the “free time budget” : the time spent on the new app is taken away from the time spent on Facebook. This is clearly true for WhatsApp with more than 10 hours of monthly use (here also, statistics vary, but the tally is still impressive).
  3. This decreases the perceived value of Facebook for other users, who open an account and then spend some of their SNS time onto the new app. This has yet to appear for the WhatsApp case; for instance, in Spain where WhatsApp is very strong, Facebook is still growing, even if adoption rate is slower than other European countries. Also, the fastest growing segment of Facebook users is people over 55, it will be hard to get them away as a community.
  4. Eventually the new app becomes the place where the majority of users go (there is a winner take all system dynamic, which has been very profitable for Facebook since it started).

Steps (1) and (2) may happen rapidly, but (3) and (4) will take much longer (this is a guess, as said earlier, the speed has nothing to do with an epidemiological model and is much harder to model). But time spent becomes a habit, and habit takes longer to change (it takes longer to forget a habit than to pick a new one).

A lot of work is available in the scientific community related to percolation over social networks, including the work from Callaway, Newman, Strogatz and Watts, which has inspired my own research about social networks. However, the time aspect of social network usage changes completely the percolation model.
The previous curve shows that social apps have a stronger percolation capability than simpler communication apps.

5.      Conclusion

Rather than drawing a conclusion from this difficulty to efficiently model percolation of social software, I will simply point out a few directions for developing social and viral adoption of applications:
  • One must “pick the right fight”: it does not make sense to fight for usage time if the usage frequency is not high enough. If the frequency is too low, it’s a different game : how to use other SNS for “signaling” (letting people know that theirs friends have used your app).
  • Surf the wave instead of racing it” : profit from existing SNS which are created as platforms, to leverage existing social networks to grow you own app's social usage.
  • Make it easy to share your content on competing platforms (a good example being LinkedIn which allows easy sharing with Twitter, while the reciprocate exchange, that is, sharing from Twitter on any other SNS, is not true).
  • Empower your users to do whatever they please with your app, making it a true "platform". This follows from the observation that increasing time spent will increase value, hence adoption. This is something that Facebook has been quite good at (although this is a subject of debate), and that Snapshat or Instagram are also good example of.
  • Think about “value / effort” all the time and focus on simplicity, usability and speed. Especially, to the previous point, sharing/publishing must be as effortless as possible. We are back to the “maximize the value per unit of time and unit of content” principle stated in Section 2. The dynamics of content/time percolation means that a small efficiency competitive advantage can accumulate rapidly into a larger content & customer base sustainable advantage.

No comments:

Technorati Profile