Power-law Distributions in Empirical Data
This page is a companion for the paper on power-law distributions in empirical data, written by
Aaron Clauset (me),
Cosma R. Shalizi and
This page hosts implementations of the methods we describe in the article,
including several by authors other than us. Our goal is for the methods to
be widely accessible to the community.
Python users should refer to the
powerlaw package by Alstott et al.
R users should refer to the poweRlaw package by Gillespie
A. Clauset, C.R. Shalizi, and M.E.J. Newman, “Power-law distributions in empirical data” 51(4), 661-703 (2009).
Y. Virkar and A. Clauset, Power-law distributions in binned empirical data.
8(1), 89 – 119 (2014). (arXiv:get the code)
Random number generators
This function generates continuous values randomly distributed according to
one of the five distributions discussed in the article (power law,
exponential, log-normal, stretched exponential, and power law with cutoff).
Usage information is included in the file; type ‘help randht’ at the Matlab
prompt for more information.
randht.m (Matlab, by Aaron Clauset)
randht.py (Python, by Joel Ornstein)
Fitting a power-law distribution
This function implements both the discrete and continuous maximum likelihood
estimators for fitting the power-law distribution to data, along with the
goodness-of-fit based approach to estimating the lower cutoff for the scaling
region. Usage information is included in the file; type ‘help plfit’ at the
Matlab prompt for more information.
plfit.m (Matlab, by Aaron Clauset)
plfit.r (R, by Laurent Dubroca)
plfit.py (Python, by Adam Ginsburg)
plfit.c (C++, by Wim Otte; includes plvar.c)
plfit.c (C++, by Tamas Nepusz)
plfit.py (Python, by Joel Ornstein)
Visualizing the fitted distribution
After several requests, I’ve written this function, which plots (on log-log
axes) the empirical distribution along with the fitted power-law distribution.
Usage information is included in the file; type ‘help plplot’ at the Matlab
prompt for more information.
plplot.m (Matlab, by Aaron Clauset)
plplot.py (Python, by Joel Ornstein)
Estimating uncertainty in the fitted parameters
This function implements the nonparametric approach for estimating the
uncertainty in the estimated parameters for the power-law fit found by the
plfit function. It too implements both continuous and discrete versions. Usage
information is included in the file; type ‘help plvar’ at the Matlab prompt
for more information.
plvar.m (Matlab, by Aaron Clauset)
plvar.c (C++, by Wim Otte; includes plfit.c)
plvar.py (Python, by Joel Ornstein)
Calculating -value for fitted power-law model
This function implements the Kolmogorov-Smirnov test (which computes a
-value for the estimated power-law fit to the data) for the power-law
model. As above, it too implements both continuous and discrete versions of
the test. Usage information is included in the file; type ‘help plpva’ at the
Matlab prompt for more information.
plpva.m (Matlab, by Aaron Clauset)
plpva.r (R, by Laurent Dubroca; modified by Neal Walfield)
parplpva2.m (Matlab, by Casper Peterson, uses Parallel Toolbox)
plpva.py (Python, by Joel Ornstein)
Riemann Zeta function
The discrete estimator needs to calculate the Hurwitz Zeta function for
normalization. Matlab includes this function in the Symbolic Math Toolbox
(but be warned that their implementation becomes unstable for large alpha and
xmin, e.g., alpha>7 with xmin>150). There are also free versions available if
you don’t have this toolbox. For instance, Paul Godfrey’s
special functions library (via Matlab Central File Exchange) gives one, which we
mirror here (note, you need both these files; tip to Will Tracy).
deta.m (Matlab, by Paul Godfrey)
zeta.m (Matlab, by Paul Godfrey)
Calculating likelihood-ratio test results
The functions necessary to compute the log likelihood ratio tests is
implemented in the statistical programming
language R. Documentation of these functions is given in a separate file,
and the R functions themselves are in a downloadable tgz file (note: this is
not a proper R package, yet).
R code (by Cosma Shalizi)
Download all files
Get the most up-to-date versions of the complete implementations.
Download all Matlab and R files (by Aaron Clauset and Cosma Shalizi)
Download Python package (by Jeff Alstott)
Download Python (2.6) package (by Javier del Molino Matamala)
Download Java package (by Peter Bloem)
Download R package (by Colin Gillespie)
A note Matlab compatibility
For the Matlab functions written by me (Aaron), all of them were designed to
be compatible with Matlab v7. They are necessarily compatible with
older versions of Matlab. That being said, it should be possible to make them
compatible as the core functionality does not depend on v7 features.
A note about bugs and alternative implementations
The code provided here is provided as-is, with no warranty, with no guarantees
of technical support or maintenance, etc. If you experience problems while
using the code, please let the author(s) know via email. I am happy to host
(or link to) implementations of any of these functions in other programming
languages, in the interest of facilitating their more widespread use.
However, I cannot provide any technical support for that code.
The original pl* functions (Matlab) were written by Aaron Clauset and the LRT
functions (R) were written by Cosma Shalizi; all other language
implementations were written by members of the wider community.
Finally, if you use our code in an academic publication, it would be
courteous of you to thank me (Aaron) and Cosma in your acknowledgements for
providing you with implementations of the methods. If you use the
implementations by other authors, you should acknowledge them instead.
A note the data sets
The 24 data sets we studied in the paper were drawn from the literature, and
the proper citations are given in the paper. You can find much more detailed
information, including links to download many of the data sets,
A note about method tutorials
We do not currently have any tutorial information for installing or using
these methods, beyond what we describe in the paper and what is contained in
the help files that go with the Matlab and R files themselves. That being
said, the InterSciWiki at UC Irvine has
good overview tutorial page that may be of some use, and Willy Lai has
nice page, with R code, that works through several examples.
: added a link to the R package by Colin Gillespie.
: replaced plpva.r with an updated version by Neal Walfield.
: WARNING, the zeta function implementation used here is unstable for large alpha (>7) (thanks to David Gleich for pointing this out). If you need it for this range, consider using a better library function for the Hurwitz Zeta function.
: fixed a minor bug in the way plfit, plvar, plpva parse the nowarn and nosmall arguments.
: posted updated version of plfit.r, at the request of
its author Laurent Dubroca.
: posted Joel Ornstein’s Python ports of plfit, plvar, plpva and plplot.
: replaced plfit.r with new version, at request of its author Laurent Dubroca.
: fixed a minor bug in how plfit.m reports the log-likelihood of the fitted data, for the discrete case, after the selection of xmin is done; posted updated version of Wim Otte’s code with the same fix.
: posted Wim Otte’s C++ implementation of plfit and plvar.
: added the option in plfit, plvar and plpva to ‘lock’ xmin to a specific value (thanks to Paul Willems for the suggestion).
: posted Adam Ginsburg’s Python implementation of plfit.
: created a new page with detailed information
about obtaining copies of the 24 empirical data sets we studied.
: fixed a minor bug in the R version of plfit that would cause its
results to disagree slightly with the results from the matlab version (thanks to Naoki
Masuda for pointing it out).
: fixed a minor bug in the R version of plfit that would cause the
returned KS statistic to be incorrect when xmin=1 (thanks to Jeff Stuckman for
pointing it out).
: try-catch block in integer portion of plfit now defaults to
iterative version if the try block ever fails (thanks to Rajiv Das for the
: changed randht, plpva and plvar to only initialize the
pseudo-random number generator on the first time they are called.
: in the integer routines, plfit, plvar and plpva now automatically
switch to a slower but more memory efficient estimation routine when the vectorized
default routine fails (e.g., Out of Memory error when max(x) is extremely large).
: posted Laurent Dubroca’s R implementation of plfit.
: posted the plplot.m function for plotting the fitted
power-law distributions against the empirical data.
: corrected typo in plpva when using hidden ‘sample’ option, and
reordered the commands for ‘limit’ and ‘sample’ throughout (thanks to Klaas Dellschaft
: corrected typo in argument parsing for randht.m, significant
efficiency improvements to xmin estimation routine in plfit.m, plpva.m and plvar.m
(thanks to Jim Bagrow for suggestions).
: corrected interim reporting in plpva.m; changed plfit.m,
plvar.m and plpva.m to reshape input vector to column format, and to prevent using
continuous approximation in small-sample regime for discrete data.
: corrected a typo in plvar.m, typo in pareto.R, typo in
log-likelihood for discrete cut-off powerlaw and fixed small bug in a plotting
: corrected a typo in plpva.m, typo in pareto.R and updated
compilation instructions in discpowerexp.R.