Learning While Aging

Estimate of time for WordPress Plugin directory surpassing two billion downloads

Jeff Chandler from WorePress Tavern recently reported that WordPress Plugin directory has surpassed one billion downloads since March 2006 and he asked readers to estimate when it will reach two billion downloads. Instead of throwing a wild guess, I decided to do some statistical analysis on the data he provided to get a statistical estimate.

Here are the data he provided in his post:

  • March 2006  191,567
  • 2007  2,845,802
  • 2008  15,130,856
  • 2009  49,822,116
  • 2010  72,342,598
  • 2011  108,501,907
  • 2012  141,609,682
  • 2013  182,236,517
  • 2014  241,142,505
  • January 2015 to August 2015  186,243,700

The number in 2006 is very small probably because it was when WordPress started to record downloads.

In order to do the statistical analysis, I have to do some data preparation:

I use the amount of months it took to reach the total downloads plus the previous total downloads. So it took 9 months (in 2006) to reach 191,567 downloads and it took 21 months to reach 3,037,369 (191,567 + 2,845,802) downloads, and it took 33 months to reach 18,168,225 (3,037,369 + 15,130,856) downloads, and etc. So I got the data like this:

MonthsDOWNLOADSLOG_DOWNLOADS
2130373696.482497556
33181682257.2593125
45679903417.832447219
571403329398.147159621
692488348468.395911197
813904445288.591559341
935726810458.757912809
1058138235508.910530253
11210000672509.000029205

When I plot Months vs. DOWNLOADS, I got a figure like this:

WPPluginDownloadsVSMonths

It is clear that the growth of the total downloads is like exponential and in order to do linear regression I need to perform data transformation. So I took base-10 logarithm on DOWNLOADS to get LOG_DOWNLOADS and the results are shown in above table, then I plot Months VS LOG_DOWNLOADS as follows:

WPPluginDownloadsLogVSMonths

Now we can see a linear relationship between Months and LOG_DOWNLOADS, especially starting from the fourth data point (Months = 57), it is a very good linear trend, so I decided to treat the first three data points as outliers and to drop them( I know in real statistical analysis I cannot drop them, but here I am just trying to simplify things). I used the last six rows of data to perform linear regression analysis and I got the prediction function as follows:

LOG(Y) = 0.015180271X+ 7.32581705
Where Y is the total downloads, and X is the number of months

With this function, we can roughly predict that at the end of the year 2015 (Months = 117), the total downloads will be about 1,264,470,661. To reach 2 billion downloads, it will need 130 months(since March 2006), in another word, it will reach 2 billion downloads in 18 months (January 2017).

So what do you think?