On a multi-processor machine you can start to get good results in a couple of hours. On single processors you'll need to set aside rather longer. You can also run on a cluster. Also check our chains page to see if we have suitable runs available that you can importance sample from very quickly (typically just seconds to re-compute a few thousand likelihoods).
By default CosmoMC uses a simple Metropolis algorithm, but there are options for slice sampling and more powerful methods of exploring the fast/slow parameter space. The program takes as inputs estimates of central values and posterior uncertainties of the various parameters. The proposal density can use information about parameter correlations from a supplied covariance matrix: use one if possible as it will significantly improve performance. There is an option to esimate the covariance about the best-fit point, and one covariance matrix is supplied for the default base parameters. If you compile and run with MPI (to run across nodes in a cluster), there is an option to dynamically learn the proposal matrix from the covariance of the post-burn-in samples so far. The MPI option also allows you to terminate the computation automatically when a particular convergence criterion is matched. MPI is recommended.
There are two programs supplied cosmomc and getdist. The first does the actual Monte-Carlo and produces sets of .txt chain files and (optionally) .data output files (the binary .data files include the theoretical CMB power spectra etc.). The "getdist" program analyses the .txt files calculating statistics and outputs files for the requested 1D, 2D and 3D plots (and could be used independently of the main cosmomc program). The "cosmomc" program also does post processing on .data files, for example doing importance sampling with new data.
Please e-mail details of any bugs or enhancements to Antony Lewis. If you have any questions please ask in the CosmoCoffee computers and software forum. You can also read answers to other people's questions there.
Downloading and Compiling
Using MPI simplifies running several chains and proposal optimization. MPI can be used with OpenMP: generally you want to use OpenMP to use all the shared-memory processors on each node of a cluster, and MPI to run multiple chains on different nodes (the program can also just be run on a single CPU). The intel compilers are changing frequently, there is a file Makefile_intel supplied you can use with the 7.1 versions that give problems.
You can download the Intel f90 compiler for Linux here (earlier versions of ifc 8 did not work; get Build 20040716Z or later. You will also need MKL or some other LAPACK library installation). There is also now a GNU F95 compiler you could try. Note that version 7.4 of the SGI compiler is buggy - install 7.4.1m compilers plus patch 5292 [or comment out "flush" in utils.F90]. Please let me know if you have specific fixes for other compilers.
Using (Compaq) Visual Fortran there's no need to use the Makefile, just open cosmomc.dsw in the source folder, and set params.ini as the program argument under Project, Settings, Debug and set the working directory to ..\. Under Tools, Options, Directories set the include path to [cxml path]/CXML/INCLUDE and lib path to [cxml path]/CXML/LIB. Don't install the 6.6C3 update as it gives compiler errors (6.6B is fine). You then need to add healpix and cfitsio files to your project depending on where they are on your system.
Note that to compile cosmomc you need to link to LAPACK (for doing matrix diagonalization, etc) - you may need to edit the Makefile to specify where this on your system.
To change the l_max which is used for generating Cls you'll need to edit the value in cmbtypes.f90, run "make clean" then "make" to rebuild everything. Note l_max should be 50 larger than the largest l which you need accurately. You can also change matter_power_lnzsteps, the number of redshifts at which matter power spectra are sampled.
The default code includes polarization. You can edit the num_cls parameter in cmbtypes.f90 to include just temperature (num_cls=1), TT, TE and EE (num_cls=3) or TT, TE, EE and BB (num_cls=4). You will need the last option if you are including tensors and using polarized data. You can use temperature-only datasets with num_cls 3 or 4, the only disadvantage being that it will be marginally slower, and the .data files with be substantially larger. For WMAP data you need num_cls = 3 or 4.
See BibTex file for relevant citations.See the supplied params.ini file for a fairly self-explanatory list of input parameters and options. The file_root entry gives the root name for all files produced. Running using MPI on a cluster is recommended if possible as you can automatically handle convergence testing and stopping.
./cosmomc params.iniThe samples will be in file_root.txt, etc. You can start several instances of the program generating separate chains using
./cosmomc params.ini 1In this case samples will be in the files file_root_NN.txt, where NN labels the chain number.
./cosmomc params.ini 2
etc.
mpirun -np 4 ./cosmomc params.iniThere is also a supplied perl script runMPI.pl that you may be able to adapt for submitting jobs to a PBS queue, running lamboot, etc, e.g.
perl runMPI.pl params 4to run 4 chains over four nodes using the params.ini parameters file (the script is set up by default for the CITA cluster - edit ppn=2 to the number of CPUs per node you have). On the UK Cosmos computer see the FAQ below.
If things go wrong check the .log and any error files in your cosmomc/scripts directory.
weight like param1 param2 param3 ...
The weight gives the number of samples (or importance weight) with these parameters. like gives -log(likelihood). The getdist program could be used completely independently of the cosmomc program.
Run getdist distparams.ini to process the chains specified in the parameter input file distparams.ini. This should be fairly self-explanatory, in the same format as the cosmomc parameter input file. Note that sigma_8 is only computed if you are including LSS data when generating the chain (as computing the matter power spectrum slows things down considerably; You can post-process to compute sigma8 if you like, see action=1 in the cosmomc input file).
GetDist Parameters
Output Text Files
Plotting
Parameter labels are set in distparams.ini - if any are blank the parameter is ignored. You can also specify which parameters to plot, or if parameters are not specified for
the 2D plots or the colour of the 3D plots getdist automatically works out
the most correlated variables and uses them.
The data files used by SuperMongo and MatLab
are output to the plot_data directory.
Performance of the MCMC can be improved by using parameters which have a close to Gaussian posterior distribution. The default parameters (which get implicit flat priors) are
Parameters like H_0 and Omega_lambda are derived from the above. Using theta rather than H_0 is more efficient as it is much less correlated with other parameters. There is an implit prior 40 < H_0 < 100. The .txt chain files list derived parameters after the 13 base parameters. By default these are ΩΛ (14), Age/Gyr (15), Ωm (16), σ8 (17), zre (18), r10 (19) and H0 (20). r10 is the ratio of the tensor to scalar Cl at l=10.
Since the program uses a covariance matrix for the parameters, it knows about (or will learn about) linear combination degeneracies. In particular ln[10^10 A_s] - 2*tau is well constrained, since exp(-2tau)A_s determines the overall amplitude of the observed CMB anisotropy (thus the above parameterization explores the tau-A degeneracy efficiently). The supplied covariance matrix will do this even if you add new parameters.
Changing parameters does in principle change the results as each base parameter has a flat prior. However for well constrained parameters this effect is very small. In particular using theta rather than H_0 has a small effect on marginalized results.
The above parameterization does make use of some knowledge about the physics, in particular the (approximate) formula for the sound horizon. Also supplied is a params_H.f90 file which uses H_0,z_re and A_s instead of theta, tau and log(10^10 A_s) which is more generic. Though slower to converge, this may be useful if you want to play around with different extended models - just edit the Makefile to use params_H.f90 instead of params_CMB.f90. Sample input files and covariance matrix along with params_H.f90 are available here. Since the parameters have a different meaning in this parameterization, you should not try to mix .covmat (or other) files with those from the default parameterization.
The supplied CMB datasets that are used for computing the likelihood are given in
*.dataset files in the data directory (these may not be up to date). These are in a standard .ini format,
and contain the data points and errors, data name, calibration and beam
uncertainties, and window file directory. Code for handling these is in cmbdata.f90. The WMAP data is handled separately as a special case. Various simple priors are encoded in calclike.f90.
The most likely need to modify the code is to change l_max, num_cls, or matter_power_lnzsteps, all specified in cmbtypes.f90. To change the numbers of parameters you'll need to change the constants in settings.f90. Run "make clean" after changing settings before re-compiling. When adding just one additional parameter it's often easiest to re-interpret one of the default parameters rather than adding in new parameters.
You are encouraged to examine what the code is doing and consider carefully
changes you may wish to make. For example, the results can depend on the
parameterization. You may also want to use different CAMB modules, e.g.
slow-roll parameters for inflation, or use a fast approximator. The main
source code files you may want to modify are
The .ini file comments should explain the other options.
GetDist produces a file_root.sm file for use with sm. Run sm < file_root.sm to produce file_root.ps containing a plot of the 1D marginalized posterior distributions.
GetDist produces MatLab '.m' files to do 1D, 2D and 3D plots. Type file_root into a MatLab
window set to the directory containing the .m files to produce 1D marginalized plots. You can also do
You can use the blue matlab script (in the mscripts) directory to change to a B&W-friendly colormap (see also other colormaps in that directory). To compare two different sets of chains set compare_num=1 in the .ini file, and compare1 to the root name of some chains you have previously run GetDist on.
Some matlab scripts are also supplied for making custom matlab plots using the files produced by GetDist (see also CosmoloGui). The scripts are in the mscripts directory - you will probably want to add this to your matlab path using e.g. addpath('mscripts'). The scripts are:
confid2D('file_root1',8,17,'-k','b');
hold on;
confid2D('file_root2',8,17,'-k','r');
hold off;
show_contours_behind;
This last (optional) command is a supplied script which will show dotted lines lying behind other solid contours. If the last colour argument is omitted, confid2D plots unfilled contours only.
Convergence diagnostics
The getdist program will output convergence diagnostics, both short summary information when getdist is run, and also more detailed information in the file_root.converge file. When running with MPI the first two of the parameters below can also be calculated when running the chains for comparison with a stopping criterion (see the .ini input file).
Differences between GetDist and MPI run-time statistics
GetDist will cut out ignore_rows from the beginning of each chain, then compute the R statistic using the last half of the remaining samples. The MPI run-time statistic uses the last half of all of the samples. In addition, GetDist will use all the parameters, including derived parameters. If a derived parameter has poor convergence this may show up when running GetDist but not when running the chain (however the eigenvalues of covariance of means is computed using only base parameters). The run-time values also use thinned samples (by default every one in ten), whereas GetDist will use all of them. GetDist will allow you to use only subsets of the chains.
Parameterizations
Hard coded priors
The default installation hard codes a few priors, in some instances you may wish to edit these:
There is no prior on the positivity of Omega_Lambda.
Data
There is also built-in support for 2dF and (few years old) supernovae observations. Adding new data sets should be quite straightforward - you are encouraged to donate anything you add to be used by everyone.
Programming
This defines what the input variables mean. Change this to use different
variables. You can change which parameterization file to use in the Makefile. params_H.f90 is also supplied for using z_re, A_s and H_0 instead of tau, log(A_s) and theta.
You need to change this file to specify the l_max used. Chains can
be generated at low l_max, then post-processed with a compile using a higher
l_max. You can also change the num_cls number of (temperature plus polarization) Cls to compute and store.
This defines the number of parameters and their types. You will need
to change this if you use more parameters.
This reads in the CMB .dataset information and computes likelihoods.
You may wish to edit this, for example to use likelihood distributions
for the band powers, or to compute the likelihood from actual polarized data. This version assumes polarized data points are an arbitrary combination of the raw TT, TE, EE, and BB Cls, as specified in the window files in data/windows. WMAP data is handled as a special case.
Analagous to cmbdata, but for matter power spectrum measurements. Reads in generic dataset files, supported (fixed) covariance matrices.
This is the proposal density and related constants and subroutines. The efficiency
of MCMC is quite dependent on the proposal. Fast+slow and fast parameter subspaces are proposed separately. See the notes for a discussion of the proposal density and use of fast and slow parameters.
Routines for generating Cls, matter power spectra and sigma8 from CAMB.
Replace this file to use other generators, e.g. a fast approximator like
CMBfit or DASH.
As of May 2006, uses SNLS by default. Edit to use Riess gold. (thanks to David Rapetti, Jochen Weller and Anze Slosar).
Lyman alpha data (thanks to Matteo Viel, J.Lesgourgues). Can also replace with SDSSLy-a-v3.f90 and recompile to use SDSS data (thanks to Kevork Abazajian). Note this is only tested and likely to be reliable for standard LCDM models.
Reads in .data files and re-calculates likelihoods or theory predictions. Unused in MCMC runs.
Add in calls to other likelihood calculators, etc., here.
Main program that reads in parameters and calls MCMC or post-processing.
The "getdist" program for analysing chains. Write your own importance
weighting function or parameter mapping.Version History
Information Theory, Inference and Learning Algorithms
Raftery and Lewis convergence diagnostics
There are also some notes on the proposal density, fast and slow parameters, and slice sampling as used by CosmoMC. See also the BibTex file of CosmoMC references you should cite, along with some references
of potential interest.
FAQ
cfitsio = /home/cosmos/share-ia64 healpix = /home/cosmos/share-ia64/Healpix_2.01 WMAP3 = /home/cosmos/ccc-cam/aml1005/WMAP3
Feel free to ask questions (and read answers to other people's) on the CosmoCoffee software forum. There is also a FAQ in the CosmoloGUI readme.