PRoNTo Frequently Asked Questions (FAQ)



    Before installing PRoNTo, please make sure that your system is supported. All the software requirements are listed here. Special attention should be paid to the Matlab version (R2008 and up, avoiding R2017b). SPM12, with latest updates, should be installed. Download the latest version of PRoNTo from here. Open MATLAB and add both the PRoNTo and SPM (if not previously added) paths to the MATLAB Search Path. You can do this either by using MATLAB's graphical user interface (‘Add folder’ option), or by typing in the Command Window:

    addpath "path_to_pronto"

    addpath "path_to_spm"

    For more information about MATLAB's paths, please refer to here.

    To start PRoNTo, type either "pronto" or "prt" in MATLAB's Command Window.


    PRoNTo interfaces a couple of libraries (e.g. LIBSVM) in which code was written in C or C++ for efficiency. To be able to read this C or C++ code, Matlab needs an interface, called a ‘mex’ file. As with C or C++ code, that interface needs to be compiled. We provide pre-compiled routines, but it might be that your system requires the interfaces to be re-compiled. That would typically be the case for some Linux OS or for older versions of Matlab (before R2014). We also noticed an issue with R2017b, for which Matlab decided not to support some compilers anymore. This has been fixed in R2018a and we suggest switching versions if you have R2017b. Alternatively, the manual contains detailed instructions on how to re-compile the needed interfaces.

    For SVM specifically, please also ensure that PRoNTo is at the top of your Matlab path as some Matlab toolboxes interfere with the interfaces.


    These usually happen when you run PRoNTo in a version of MATLAB that is not supported. Please check the system requirements here.

    If you are still experiencing problems, please restart MATLAB.

    If you are both running a version of MATLAB that is supported, and you are still experiencing problems after restarting, please refer to the PRoNTo Mailing List for help.


    Users using MATLAB R2016a, R2017a and later on versions may find errors related to compilers when running PRoNTo for the first time in MATLAB. Please find a tutorial here to fix these issues.



    In version 2, the only data format accepted by PRoNTo are NIFTI images. As long as your connectivity data (e.g. connectivity matrices) have been converted to NIFTI images, they can be used in PRoNTo. Version 3 will allow the user to input non-imaging data (such as data stored in .mat files). We hope to release a beta version soon.


    The names can be anything, as long as they are in alphanumeric characters (i.e. a to z, 0 to 9) or underscore (_). Special characters like white spaces, dashes, … should be avoided and might throw an error if used.

    In the batch, the naming has to be consistent from one step to the next, i.e. a modality should always be referred to with exactly the same name. Font case (i.e. lower or upper case) should not be an issue, but to be on the safe side we recommend keeping the same font case as well. See next question for an example.


    In the batch, the name of each modality in the Masks option, should be exactly the same as the name of the modalities specified in the Groups. If the names somehow mismatch there will be an error. For instance, if there are 3 modalities called Run1, Run2 and Run3 there should be 3 masks, where the modality name for each mask should be Run1, Run2 and Run3, respectively. The same mask file can be used for the 3 modalities though, as long as the names of the modalities in Masks are exactly the same as the names in Groups.


    For all modalities, a ‘1st-level’ mask is required. This mask should have a value of 1 at voxels that have intensity values in all subjects and conditions and 0 elsewhere. This masking is used to identify which voxels are ‘within the brain’ across all input images and reduce memory usage at the ‘Feature Set/Kernel’ step.

    Please note that, as mentioned, the mask should correctly identify the voxels that have intensity values across all input images. This requires special attention when dealing with beta and contrast images that often include Not a Number (NaN) values at different locations across images. You should ensure that the mask excludes those NaN values, either by building your own mask (e.g. using SPM’s ImCalc) or using the script provided in PRoNTo\utils (see manual). If a message is displayed in the Matlab workspace during the building of the feature set saying 'Warning: NaNs found in loaded data', please revise your mask


    PRoNTo assumes that voxel 1 in image 1 corresponds to voxel 1 in image 2. This means that the input images need to be aligned and registered in the same space. PRoNTo can perform some additional preprocessing, such as the option of detrending a time series signal (for fMRI) or scaling all the voxel intensities in an image by a single value (more typically used for PET images). However, any pre-processing step that would increase the signal-to-noise ratio should be considered, as long as it does not involve using the targets of the machine learning model (e.g. building a mask by thresholding a GLM map contrasting between faces and houses and using the same data to discriminate between faces and houses is double-dipping).


    There are 2 ways to enter beta images in PRoNTo:

  • For within-subject design, or many beta images per subject: In this case, the subjects should be entered one by one, and a design should be specified. The SPM.mat can NOT be used! Instead, the design should be specified by hand, with references to the input images, not to events in seconds. Example: one subject with 3 conditions, one beta per condition. Select ‘Specify design’, then choose to specify the conditions manually. Let’s assume that the images were selected in the order ‘beta_condA’, ‘beta_condB’, ‘beta_condC’. Specify 3 conditions, corresponding to the order the images were selected in, i.e. ‘condA’ with onset 0 and duration 1, ‘condB’ with onset 1 and duration 1 and ‘condC’ with onset 2 and duration 1. Note: you can also save the described ‘names’, ‘onsets’ and ‘durations’ in a .mat and load this file. The units need to be set to ‘Scans’ and the TR to 1. This tells PRoNTo to derive the design in images/TRs, not in seconds. Make sure to leave the HRF parameters to their default values of 0, as the HRF shape has been accounted for in the estimation of the betas.
  • For across subjects design, or a few beta images and plenty of subjects: In that case, groups can be used instead. For each beta condition, create a group (e.g. ‘condA’) and select by ‘Scans’. Enter the images in the order of your subjects. If you have multiple beta conditions, repeat the process by creating another group, maintaining the same order of subject selection. This is indeed crucial to ensure proper cross-validation: PRoNTo does not know that subject 1 from group ‘condA’ is the same as subject 1 from group ‘condB’. If the order is not maintained, the same subject could end up in both the train and test sets in a fold. This would lead to over-optimistic model performance estimation. This is referred to as ‘leakage’ in machine learning, i.e. correlated information (here the same subject) is present both in the train and test sets.


    PRoNTo comes with a couple of options for cross-validation. Which cross-validation is used will have an impact on the estimated model and on its performance. Here are a couple of recommendations, based on empirical evidence and PRoNTo’s implementation:

  • Leave Subjects Out: Leaves a given proportion of subjects for testing while using the others for training. This scheme works well when subjects have a design and the classes were defined based on this design. For example, 40 subjects with ‘condA’ and ‘condB’ conditions, entered either as beta images (with design, not groups) or as fMRI time series. It is NOT appropriate for cases when classifying groups if the groups were defined on the same subjects (e.g. beta images entered as groups, see related question). It is not ideal when classifying groups (e.g. healthy vs. diseased) as it will lead to imbalance between the classes in the train and in the test set. For example: 50 healthy, 50 diseased, 5-folds CV on subjects out. Fold 1 is: train = 30 healthy + 50 diseased, test = 20 healthy.
  • Leave Subjects per Class Out: Leaves a proportion of subjects out for testing identical for each defined class (e.g. leaves 10% healthy and 10% diseased out). This scheme works well when subjects are matched across groups and those groups define the classes. For example, healthy vs. diseased, with subject 1 in healthy matched based on age and gender to subject 1 in diseased. It is the only good choice if the subjects in the different groups are correlated (as for beta images from the same subjects, see question 2.6). In both cases, care must be taken at the Data and Design step to enter the images in the correct order. This CV scheme is however NOT appropriate when the classes are defined based on the conditions of a within-subject design (e.g. the example in the previous paragraph).

    Leave-One-Out cross-validation has been shown to be over-optimistic for neuroimaging applications (Varoquaux et al., 2016) so we recommend to use k-folds instead. Values of 5 or 10 for k (i.e. leaving 20 or 10% out, respectively) have been suggested to provide a good trade-off between fair estimates of generalization ability and model performance (Hastie and Tibshirani, 2013).

    The number of folds chosen will affect model performance. All cross-validation schemes that were tried should be reported, along with their model performance. We also recommend to report not only the cross-validation performance across folds, but also the variance or standard deviation (Varoquaux, 2017).



    In regression problems, a negative correlation would indicate that the model is not learning from the data, i.e. it is not finding an association between the training patterns (nifti images) and the targets (e.g. clinical scores). Often if the model has not learnt a relationship between the training patterns and the targets, it predicts an ‘average’ of the training targets for the test data, with some noise. This can lead to negative correlations between predicted and the real targets, as the lower targets will be overestimated and the higher targets will be underestimated.

    Performing a significance test of (e.g.) mean-squared error using permutation tests should provide evidence on whether the model is learning from the data or not. In general, mean-squared error is a much better metric than correlation for evaluating regression and should be the primary measure used to evaluate performance of regression models even if the correlation between predicted and real targets is positive.


    Again, this can happen if the model is unable to learn any relationship between the training patterns and the labels and should not be interpreted as if the model was learning a relationship between the images and the “flipped” labels.


    There are different ways of extracting features from fMRI data. In order to account for the HRF properties one can run a General Linear Model (GLM) analysis for each subject a priori and use either the GLM coefficient (beta images) or the contrast images derived from the GLM as input in PRoNTo. The choice between using beta or contrast images depends on the research questions and different choices can lead to different results.

    For example, assume you have 2 groups (healthy vs diseased), where each subject is performing a task with 2 conditions (A and B). For each subject, a GLM is built, giving one beta image for A, one for B, as well as the contrast A-B. The healthy group is first used to discriminate A from B and does it significantly better than chance. The diseased group cannot discriminate between A and B based on this data. It might then be reasonable to think that the contrast A-B would discriminate the 2 groups.

    In practice however, a few users reported that the healthy vs diseased discrimination based on the contrast did not lead to significant result. The reason might be due to the signal variance in the 2 conditions. For example, the mean of the signal could be the same in beta_A and in beta_B in both groups, but the variance of the signal in beta_A and in beta_B is different in one group (here the healthy group). A difference in variance could be sufficient to obtain significant discrimination between A and B in the healthy group. But when using the contrast, the means of the conditions are subtracted and the difference in variance is lost (Hebart and Baker, 2017).


    Weight maps are a spatial representation of the predictive function and show the relative contribution of all voxels for the model.

    As the obtained map reflects the predicted targets, it cannot be thresholded, as each voxel and weight contributes to the final prediction. Relating the amplitude of the weight to potential brain sources can also be misleading. For more information on weight interpretation, please see (list non exhaustive):

  • Haufe et al., 2014
  • Kia et al., 2017
  • Hebart and Baker, 2017
  • Schrouff and Mourão-Miranda, 2018 (or here)

    For clarity, let us first consider the case of the image consisting of a single voxel, v1. The predictive linear model is then

    f=v1*w1+b

    where b is the intercept term. In that case the weight w1 at that voxel is the change in the predictions, f, per unit change in the signal at v1.

    For example if w1=3, the predicted target will increase by 3 every time the signal at v1 increases by 1, so subjects with a higher signal at v1 will have higher predicted targets.

    If the weight w1=-2, the predicted target will decrease by 2 every time the signal at v1 increases by 1. Now subjects with a higher signal at v1 will have lower predicted targets.

    For images containing many voxels, we can think of it in a similar way, but holding the image values at other voxels fixed. So if we have voxels v1 and v2, with predictive model

    f=v1*w1+v2*w2+b

    and weights w1=3 and w2=-2, predicted targets will increase by 3 per unit increase in the signal at v1, when holding the signal at v2 constant. Similarly, predicted targets will decrease by 2 per unit increase in the signal at v2, when holding the signal at v1 constant.

    We can however not say that a positive weight means that the corresponding voxel is more ‘active’ for higher values of the target. While a higher signal value could be an explanation, other sources can lead to a non-null weight in a voxel, such as the noise structure. Please see references in question 4.4 for more details.


    We can interpret the weight vector in a similar way to that for regression (see previous question). Consider again the example with voxels v1 and v2, with weights w1=3 and w2=-2.

    Once more a unit increase in the signal at v1 will increase the function value f by 3 per unit increase in the signal at v1, when holding the signal at v2 constant. However, unlike the regression case, f isn’t the predicted target but indicates either the signed distance from the classification boundary (SVM/MKL), or the log-odds-ratio, log(p(class1)/p(class2)) (Gaussian Process Classification). So for the SVM case, a unit increase in the signal at v1 will add 3 to the signed distance from the classification boundary (moving towards class 1), while for the GPC it will multiply the log-odds-ratio by 3 and so increase the probability of being class 1.

    We cannot say that a negative weight in a voxel means that the corresponding voxel is ‘activated’ in class 2. The ‘source’ of the weight can be related to voxel activity but also to noise structure differences between the 2 classes. Please see references in question 4.4 for more details.