CosmoMC Readme

This version April 2006. Check the web page for the latest version.


Contents

See also the CosmoloGUI readme for information about how to make plots from samples using an easy-to-use graphical user interface.

Introduction

CosmoMC is a Fortran 90 Markov-Chain Monte-Carlo (MCMC) engine for exploring cosmological parameter space, together with code for analysing Monte-Carlo samples and importance sampling. The code does brute force (but accurate) theoretical matter power spectrum and Cl calculations with CAMB. See the paper for an introduction, descriptions, and typical results from some pre-WMAP data.

On a multi-processor machine you can start to get good results in a couple of hours. On single processors you'll need to set aside rather longer. You can also run on a cluster. Also check our chains page to see if we have suitable runs available that you can importance sample from very quickly (typically just seconds to re-compute a few thousand likelihoods).

By default CosmoMC uses a simple Metropolis algorithm, but there are options for slice sampling and more powerful methods of exploring the fast/slow parameter space. The program takes as inputs estimates of central values and posterior uncertainties of the various parameters. The proposal density can use information about parameter correlations from a supplied covariance matrix: use one if possible as it will significantly improve performance. There is an option to esimate the covariance about the best-fit point, and one covariance matrix is supplied for the default base parameters. If you compile and run with MPI (to run across nodes in a cluster), there is an option to dynamically learn the proposal matrix from the covariance of the post-burn-in samples so far. The MPI option also allows you to terminate the computation automatically when a particular convergence criterion is matched. MPI is recommended.

There are two programs supplied cosmomc and getdist. The first does the actual Monte-Carlo and produces sets of .txt chain files and (optionally) .data output files (the binary .data files include the theoretical CMB power spectra etc.). The "getdist" program analyses the .txt files calculating statistics and outputs files for the requested 1D, 2D and 3D plots (and could be used independently of the main cosmomc program). The "cosmomc" program also does post processing on .data files, for example doing importance sampling with new data.

Please e-mail details of any bugs or enhancements to Antony Lewis. If you have any questions please ask in the CosmoCoffee computers and software forum. You can also read answers to other people's questions there.

Downloading and Compiling

If you don't want to use WMAP3 you can compile with -DNOWMAP, then no need to link to cfitsio or healpix.

You can download the Intel f90 compiler for Linux here (earlier versions of ifc 8 did not work; get Build 20040716Z or later. You will also need MKL or some other LAPACK library installation). There is also now a GNU F95 compiler you could try. Note that version 7.4 of the SGI compiler is buggy - install 7.4.1m compilers plus patch 5292 [or comment out "flush" in utils.F90]. Please let me know if you have specific fixes for other compilers.

Using (Compaq) Visual Fortran there's no need to use the Makefile, just open cosmomc.dsw in the source folder, and set params.ini as the program argument under Project, Settings, Debug and set the working directory to ..\. Under Tools, Options, Directories set the include path to [cxml path]/CXML/INCLUDE and lib path to [cxml path]/CXML/LIB. Don't install the 6.6C3 update as it gives compiler errors (6.6B is fine). You then need to add healpix and cfitsio files to your project depending on where they are on your system.

Note that to compile cosmomc you need to link to LAPACK (for doing matrix diagonalization, etc) - you may need to edit the Makefile to specify where this on your system.

To change the l_max which is used for generating Cls you'll need to edit the value in cmbtypes.f90, run "make clean" then "make" to rebuild everything. Note l_max should be 50 larger than the largest l which you need accurately. You can also change matter_power_lnzsteps, the number of redshifts at which matter power spectra are sampled.

The default code includes polarization. You can edit the num_cls parameter in cmbtypes.f90 to include just temperature (num_cls=1), TT, TE and EE (num_cls=3) or TT, TE, EE and BB (num_cls=4). You will need the last option if you are including tensors and using polarized data. You can use temperature-only datasets with num_cls 3 or 4, the only disadvantage being that it will be marginally slower, and the .data files with be substantially larger. For WMAP data you need num_cls = 3 or 4.

See BibTex file for relevant citations.

Running and input parameters

See the supplied params.ini file for a fairly self-explanatory list of input parameters and options. The file_root entry gives the root name for all files produced. Running using MPI on a cluster is recommended if possible as you can automatically handle convergence testing and stopping.


Input Parameters

Output files

Analysing samples and plotting

The getdist program analyses text files produced by the cosmomc program. These are in the format

    weight like param1 param2 param3 ...

The weight gives the number of samples (or importance weight) with these parameters. like gives -log(likelihood). The getdist program could be used completely independently of the cosmomc program.

Run getdist distparams.ini to process the chains specified in the parameter input file distparams.ini. This should be fairly self-explanatory, in the same format as the cosmomc parameter input file. Note that sigma_8 is only computed if you are including LSS data when generating the chain (as computing the matter power spectrum slows things down considerably; You can post-process to compute sigma8 if you like, see action=1 in the cosmomc input file).

GetDist Parameters

The .ini file comments should explain the other options.

Output Text Files

Plotting

Parameter labels are set in distparams.ini - if any are blank the parameter is ignored. You can also specify which parameters to plot, or if parameters are not specified for the 2D plots or the colour of the 3D plots getdist automatically works out the most correlated variables and uses them. The data files used by SuperMongo and MatLab are output to the plot_data directory.

Convergence diagnostics

The getdist program will output convergence diagnostics, both short summary information when getdist is run, and also more detailed information in the file_root.converge file. When running with MPI the first two of the parameters below can also be calculated when running the chains for comparison with a stopping criterion (see the .ini input file).
Differences between GetDist and MPI run-time statistics

GetDist will cut out ignore_rows from the beginning of each chain, then compute the R statistic using the last half of the remaining samples. The MPI run-time statistic uses the last half of all of the samples. In addition, GetDist will use all the parameters, including derived parameters. If a derived parameter has poor convergence this may show up when running GetDist but not when running the chain (however the eigenvalues of covariance of means is computed using only base parameters). The run-time values also use thinned samples (by default every one in ten), whereas GetDist will use all of them. GetDist will allow you to use only subsets of the chains.

Parameterizations

Performance of the MCMC can be improved by using parameters which have a close to Gaussian posterior distribution. The default parameters (which get implicit flat priors) are

  1. ombh2 - the physical baryon density
  2. omch2 - the physical dark matter density
  3. 100*theta - 100*(the ratio of the [approx] sound horizon to the angular diameter distance)
  4. tau - the optical depth
  5. omk - omega_K
  6. nufrac - the fraction of the dark matter energy in the form of massive neutrinos
  7. w - the (assumed constant) equation of state of the dark energy (taken to be quintessence)
  8. n_s - the scale spectral index
  9. n_t - the tensor spectral index
  10. n_run - the running of the scalar spectral index
  11. ln[10^10 A_s] - A_s is the primordial superhorizon power in the curvature perturbation on 0.05Mpc^{-1} scales (i.e. in this is an amplitude parameter)
  12. amp_ratio - the ration A_t/A_s, where A_t is the primordial power in the transverse traceless part of the metric tensor
  13. is an unused fast parameter

Parameters like H_0 and Omega_lambda are derived from the above. Using theta rather than H_0 is more efficient as it is much less correlated with other parameters. There is an implit prior 40 < H_0 < 100. The .txt chain files list derived parameters after the 13 base parameters. By default these are ΩΛ (14), Age/Gyr (15), Ωm (16), σ8 (17), zre (18), r10 (19) and H0 (20). r10 is the ratio of the tensor to scalar Cl at l=10.

Since the program uses a covariance matrix for the parameters, it knows about (or will learn about) linear combination degeneracies. In particular ln[10^10 A_s] - 2*tau is well constrained, since exp(-2tau)A_s determines the overall amplitude of the observed CMB anisotropy (thus the above parameterization explores the tau-A degeneracy efficiently). The supplied covariance matrix will do this even if you add new parameters.

Changing parameters does in principle change the results as each base parameter has a flat prior. However for well constrained parameters this effect is very small. In particular using theta rather than H_0 has a small effect on marginalized results.

The above parameterization does make use of some knowledge about the physics, in particular the (approximate) formula for the sound horizon. Also supplied is a params_H.f90 file which uses H_0,z_re and A_s instead of theta, tau and log(10^10 A_s) which is more generic. Though slower to converge, this may be useful if you want to play around with different extended models - just edit the Makefile to use params_H.f90 instead of params_CMB.f90. Sample input files and covariance matrix along with params_H.f90 are available here. Since the parameters have a different meaning in this parameterization, you should not try to mix .covmat (or other) files with those from the default parameterization.

Hard coded priors

The default installation hard codes a few priors, in some instances you may wish to edit these: There is no prior on the positivity of Omega_Lambda.

Data

The supplied CMB datasets that are used for computing the likelihood are given in *.dataset files in the data directory (these may not be up to date). These are in a standard .ini format, and contain the data points and errors, data name, calibration and beam uncertainties, and window file directory. Code for handling these is in cmbdata.f90. The WMAP data is handled separately as a special case. Various simple priors are encoded in calclike.f90.

There is also built-in support for 2dF and (few years old) supernovae observations. Adding new data sets should be quite straightforward - you are encouraged to donate anything you add to be used by everyone.

Programming

The most likely need to modify the code is to change l_max, num_cls, or matter_power_lnzsteps, all specified in cmbtypes.f90. To change the numbers of parameters you'll need to change the constants in settings.f90. Run "make clean" after changing settings before re-compiling. When adding just one additional parameter it's often easiest to re-interpret one of the default parameters rather than adding in new parameters.

You are encouraged to examine what the code is doing and consider carefully changes you may wish to make. For example, the results can depend on the parameterization. You may also want to use different CAMB modules, e.g. slow-roll parameters for inflation, or use a fast approximator. The main source code files you may want to modify are

Version History

Reference links

Probabilistic Inference Using Markov Chain Monte Carlo Methods

Information Theory, Inference and Learning Algorithms

MCMC Preprint Service

Raftery and Lewis convergence diagnostics

There are also some notes on the proposal density, fast and slow parameters, and slice sampling as used by CosmoMC. See also the BibTex file of CosmoMC references you should cite, along with some references of potential interest.

FAQ

  1. What are the dotted lines on the plots?
    Dotted lines are mean likelihoods of samples, solid lines are marginalized probabilities. For Gaussian distributions they should be the same. For skew disctributions, or if chains are poorly converged, they will not be. Sometime there is a much better fit (giving high mean likelihood) in a small region of parameter space which has very low marginalized probability. There is more discussion in the original paper.
     
  2. What's in the .likestats file produced by getdist?
    These are values for the best-fit sample, and projections of the n-Dimensional confidence region. Usually people only look at the best-fit values. The n-D limits give you some idea of the range of the posterior, and are much more conservative than the marginalized limits.
     
  3. I'm writing a paper "constraints on X". How do I add X to cosmomc?
    Often the easiest thing to do is to re-interpret one of the unused standard parameters, e.g. w, A_T, etc, depending on whether the parameter is "fast" or not (if in doubt, use one of the slow parameters like w or f_nu). You just need to change the CMBToCAMB routine in CMB_Cls_Simple.f90 so that your parameter is correctly fed into CAMB, change limits appropriately in the params.ini file, etc. See the undocumented references to w_is_w in the code for how w can be re-interpreted as a ratio of isocurvature to adiabatic in CAMB's initial conditions. If you need to add more than a couple of parameters you'll probably instead need to edit settings.f90 to increase the number of parameters, and edit the .ini file accordingly. Slow parameters should have numbers lower than fast parameters (i.e. to insert 5 slow parameters, the index of the fast parameters in the .ini file should increase by 5).
     
  4. How do I submit cosmomc jobs to the LSF Cambridge COSMOS supercomputer queues?
    The data files are already on cosmos, so just download the code from LAMBDA and then use 'ln -s /home/cosmos-tmp/WMAP data' to make the link from the WMAP/data directory to the existing files. To run use the runCosmomc script that should be installed on your path: type just 'runCosmomc' for command options. Use 'module load cosmolib' and linking options are
    cfitsio = /home/cosmos/share-ia64
    healpix = /home/cosmos/share-ia64/Healpix_2.01
    WMAP3 = /home/cosmos/ccc-cam/aml1005/WMAP3
    
  5. Why do some chains sometimes appear to get stuck?
    Usually this is because the starting position for the chain is a long way from the best fit region. Since the marginal distributions of e.q. A_s are rather narrow, it can take a while for chains to move from into an acceptable region of A_s exp(-2τ). The cure is to check your starting parameter values and start widths (make sure the widths are not too wide), or to use a sampling method that is more robust (e.g. use sampling_method = 4). If you are patient, stuck chains should eventually find a sensible region of parameter space anyway. Occasionally the staring position may be in a corner of parameter space so that prior ranges prevent any resonable proposed moves. In this case check your starting values and ranges, or just try re-starting the chain (a different random starting position will possibly be OK).
     
  6. How to I simulate futuristic CMB data? See CosmoCoffee.

Feel free to ask questions (and read answers to other people's) on the CosmoCoffee software forum. There is also a FAQ in the CosmoloGUI readme.


Antony Lewis.