Home > machines > prt_rvr.m

prt_rvr

PURPOSE ^

Optimisation for Relevance Vector Regression

SYNOPSIS ^

function [varargout] = prt_rvr(varargin)

DESCRIPTION ^

 Optimisation for Relevance Vector Regression

 [w,alpha,beta,ll] = prt_rvr(Phi,t)
 Phi   - MxM matrix derived from kernel function of vector pairs
 t     - the values to be matched
 w     - weights
 alpha - 1/variance for the prior part of the model
 beta  - 1/variance for the likelihood part of the model
 ll    - the negative log-likelihood.

 [w,alpha,beta,nu,ll]=spm_rvr(K,t,opt)
 K     - a cell-array of MxM dot-product matrices.
 t     - the values to be matched
 opt   - either 'Linear' or 'Gaussian RBF'
         'Linear'       is for linear regression models, where
                        the optimal kernel is generated by
                        [nu(1)*K{1} + nu(1)*K{2}... ones(size(K{1},1),1)]
         'Gaussian RBF' is for regression using Gaussian radial basis
                        functions.  The kernel is generated from
                        P1  = nu(1)*K{1} + nu(1)*K{2} ... ;
                        P2  = repmat(diag(P1) ,1,size(P1,2)) +...
                              repmat(diag(P1)',size(P1,1),1) - 2*P1;
                        Phi = exp([-0.5*P2 ones(size(P1,1),1)]);
 w     - weights
 alpha - 1/variance for the prior part of the model
 beta  - 1/variance for the likelihood part of the model
 nu    - parameters that convert the dot-product matrices into
         a kernel matrix (Phi).
 ll    - the negative log-likelihood.

 The first way of calling the routine simply optimises the
 weights.  This involves estimating a restricted maximum
 likelihood (REML) solution, which maximises P(alpha,beta|t,Phi).
 Note that REML is also known as Type II Maximum Likelihood
 (ML-II). The ML-II solution tends towards infinite weights for
 some the regularisation terms (i.e. 1/alpha(i) approaches 0).
 The appropriate columns are removed from the model when
 this happens.

 The second way of calling the routine also estimates additional
 input scale parameters as described in Appendix C of Tipping (2001).
 This method is much slower, as a full optimisation for the scale
 parameters is done after each update of the alphas and beta.

 see: http://research.microsoft.com/mlp/RVM/relevance.htm

 Refs:
 The Relevance Vector Machine.
 In S. A. Solla, T. K. Leen, and K.-R. Müller (Eds.),
 Advances in Neural Information Processing Systems 12,
 pp.  652-658. Cambridge, Mass: MIT Press.

 Michael E. Tipping
 Sparse Bayesian Learning and the Relevance Vector Machine
 Journal of Machine Learning Research 1 (2001) 211-244
________________________________________________________________________
 Copyright (C) 2011 Wellcome Department of Imaging Neuroscience & Machine Learning & Neuroimaging Laboratory

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SUBFUNCTIONS ^

SOURCE CODE ^

0001 function [varargout] = prt_rvr(varargin)
0002 % Optimisation for Relevance Vector Regression
0003 %
0004 % [w,alpha,beta,ll] = prt_rvr(Phi,t)
0005 % Phi   - MxM matrix derived from kernel function of vector pairs
0006 % t     - the values to be matched
0007 % w     - weights
0008 % alpha - 1/variance for the prior part of the model
0009 % beta  - 1/variance for the likelihood part of the model
0010 % ll    - the negative log-likelihood.
0011 %
0012 % [w,alpha,beta,nu,ll]=spm_rvr(K,t,opt)
0013 % K     - a cell-array of MxM dot-product matrices.
0014 % t     - the values to be matched
0015 % opt   - either 'Linear' or 'Gaussian RBF'
0016 %         'Linear'       is for linear regression models, where
0017 %                        the optimal kernel is generated by
0018 %                        [nu(1)*K{1} + nu(1)*K{2}... ones(size(K{1},1),1)]
0019 %         'Gaussian RBF' is for regression using Gaussian radial basis
0020 %                        functions.  The kernel is generated from
0021 %                        P1  = nu(1)*K{1} + nu(1)*K{2} ... ;
0022 %                        P2  = repmat(diag(P1) ,1,size(P1,2)) +...
0023 %                              repmat(diag(P1)',size(P1,1),1) - 2*P1;
0024 %                        Phi = exp([-0.5*P2 ones(size(P1,1),1)]);
0025 % w     - weights
0026 % alpha - 1/variance for the prior part of the model
0027 % beta  - 1/variance for the likelihood part of the model
0028 % nu    - parameters that convert the dot-product matrices into
0029 %         a kernel matrix (Phi).
0030 % ll    - the negative log-likelihood.
0031 %
0032 % The first way of calling the routine simply optimises the
0033 % weights.  This involves estimating a restricted maximum
0034 % likelihood (REML) solution, which maximises P(alpha,beta|t,Phi).
0035 % Note that REML is also known as Type II Maximum Likelihood
0036 % (ML-II). The ML-II solution tends towards infinite weights for
0037 % some the regularisation terms (i.e. 1/alpha(i) approaches 0).
0038 % The appropriate columns are removed from the model when
0039 % this happens.
0040 %
0041 % The second way of calling the routine also estimates additional
0042 % input scale parameters as described in Appendix C of Tipping (2001).
0043 % This method is much slower, as a full optimisation for the scale
0044 % parameters is done after each update of the alphas and beta.
0045 %
0046 % see: http://research.microsoft.com/mlp/RVM/relevance.htm
0047 %
0048 % Refs:
0049 % The Relevance Vector Machine.
0050 % In S. A. Solla, T. K. Leen, and K.-R. Müller (Eds.),
0051 % Advances in Neural Information Processing Systems 12,
0052 % pp.  652-658. Cambridge, Mass: MIT Press.
0053 %
0054 % Michael E. Tipping
0055 % Sparse Bayesian Learning and the Relevance Vector Machine
0056 % Journal of Machine Learning Research 1 (2001) 211-244
0057 %________________________________________________________________________
0058 % Copyright (C) 2011 Wellcome Department of Imaging Neuroscience & Machine Learning & Neuroimaging Laboratory
0059 
0060 % Written by John Ashburner
0061 % $Id: prt_rvr.m 542 2012-05-20 10:48:51Z cphillip $
0062 
0063 if isnumeric(varargin{1}),
0064     [varargout{1:nargout}]=regression0(varargin{:});
0065 elseif iscell(varargin{1}),
0066     [varargout{1:nargout}]=regression1(varargin{:});
0067 else
0068     error('Incorrect usage');
0069 end;
0070 return;
0071 %__________________________________________________________________________
0072 
0073 %__________________________________________________________________________
0074 function [w,alpha,beta,ll]=regression0(Phi,t)
0075 [N,M]  = size(Phi);
0076 if N==M,
0077     Phi = [Phi ones(N,1)];
0078 elseif M~=N+1,
0079     error('Phi must be N x (N+1)');
0080 end;
0081 scale             = sqrt(sum(sum(Phi(1:N,1:N).^2))/N^2);
0082 scale             = [ones(N,1)*scale ; 1];
0083 Phi               = Phi/spdiags(scale,0,numel(scale),numel(scale));
0084 alpha             = ones(size(Phi,2),1)/N;
0085 %beta             = N/sum((t-mean(t)).^2);
0086 beta              = 1e6;
0087 [w,alpha,beta,ll] = rvr1a(Phi,t,alpha,beta);
0088 alpha             = [alpha(1)*ones(N,1) ; alpha(2)];
0089 [w,alpha,beta,ll] = rvr2a(Phi,t,alpha,beta);
0090 w                 = w./scale;
0091 alpha             = alpha.*scale.^2;
0092 return;
0093 %__________________________________________________________________________
0094 
0095 %__________________________________________________________________________
0096 function [w,alpha,beta,ll] = rvr1a(Phi,t,alpha,beta)
0097 % This function is not actually used
0098 %spm_chi2_plot('Init','ML-II (non-sparse)','-Log-likelihood','Iteration');
0099 [N,M]   = size(Phi);
0100 ll      = Inf;
0101 PP      = Phi'*Phi;
0102 Pt      = Phi'*t;
0103 for subit=1:10,
0104     alpha_old = alpha;
0105     beta_old  = beta;
0106 
0107     % E-step
0108     S         = inv(PP*beta + spdiags([ones(N,1)*alpha(1) ; alpha(2)],0,N+1,N+1));
0109     w         = S*(Pt*beta);
0110 
0111    % figure(3); plot(t,Phi*w,'.'); drawnow;
0112 
0113     tmp       = t-Phi*w;
0114     ll        = ...
0115          -0.5*log(alpha(1))*N-0.5*log(alpha(2))-0.5*N*log(beta)-0.5*logdet(S)...
0116          +0.5*tmp'*tmp*beta + 0.5*sum(w.^2.*[repmat(alpha(1),N,1) ; alpha(2)])...
0117          +0.5*(M-N)*log(2*pi);
0118   %  if subit>1, spm_chi2_plot('Set',ll); end;
0119     %fprintf('%g\n',ll);
0120 
0121     % M-step
0122     ds        = diag(S);
0123     dfa1      = sum(ds(1:N))*alpha(1);
0124     dfa2      = sum(ds(N+1))*alpha(2);
0125     alpha(1)  = max(N-dfa1,eps)/(sum(w(1:N).^2)   +eps);
0126     alpha(2)  = max(1-dfa2,eps)/(sum(w(N+1).^2)   +eps);
0127     beta      = max(dfa1+dfa2-1,eps)/(sum((Phi*w-t).^2)+eps);
0128 
0129     % Convergence
0130     if max(max(abs(log((alpha+eps)./(alpha_old+eps)))),log(beta/beta_old)) < 1e-9,
0131         break;
0132     end;
0133 end;
0134 %spm_chi2_plot('Clear');
0135 return;
0136 %__________________________________________________________________________
0137 
0138 %__________________________________________________________________________
0139 function [w,alpha,beta,ll]=rvr2a(Phi,t,alpha,beta)
0140 %spm_chi2_plot('Init','ML-II (sparse)','-Log-likelihood','Iteration');
0141 [N,M] = size(Phi);
0142 nz    = true(M,1);
0143 
0144 PP    = Phi'*Phi;
0145 Pt    = Phi'*t;
0146 
0147 for subit=1:200,
0148     th         = min(alpha)*1e9;
0149     nz         = alpha<th;
0150     alpha(~nz) = th*1e9;
0151     alpha_old  = alpha;
0152     beta_old   = beta;
0153 
0154     % E-step
0155     S         = inv(PP(nz,nz)*beta + diag(alpha(nz)));
0156     w         = S*Pt(nz)*beta;
0157 
0158   %  figure(3); plot(t,Phi(:,nz)*w,'.'); drawnow;
0159 
0160     tmp = t-Phi(:,nz)*w;
0161     ll  = ...
0162         -0.5*sum(log(alpha(nz)+1e-32))-0.5*N*log(beta+1e-32)-0.5*logdet(S)...
0163         +0.5*tmp'*tmp*beta + 0.5*sum(w.^2.*alpha(nz))...
0164         +0.5*(sum(nz)-N)*log(2*pi);
0165 %    if subit>0, spm_chi2_plot('Set',ll); end;
0166     %fprintf('%g\t%g\n',ll,exp(mean(log(alpha)))/beta);
0167 
0168     % M-step
0169     gam       = 1 - alpha(nz).*diag(S);
0170     alpha(nz) = max(gam,eps)./(w.^2+1e-32);
0171     beta      = max(N-sum(gam),eps)./(sum((Phi(:,nz)*w-t).^2)+1e-32);
0172 
0173     % Convergence
0174     if max(max(abs(log((alpha(nz)+eps)./(alpha_old(nz)+eps)))),log(beta/beta_old)) < 1e-6*N,
0175         break;
0176     end;
0177 end;
0178 w(nz)  = w;
0179 w(~nz) = 0;
0180 w      = w(:);
0181 %spm_chi2_plot('Clear');
0182 %__________________________________________________________________________
0183 
0184 %__________________________________________________________________________
0185 function [w,alpha,beta,nu,ll]=regression1(K,t,opt)
0186 % Relevance vector regression
0187 if nargin<3, opt = 'Linear'; end;
0188 switch opt,
0189 case {'Linear','linear','lin'},
0190     dkrn_f = @make_dphi;
0191     krn_f  = @make_phi;
0192 case {'Gaussian RBF','nonlinear','nonlin'},
0193     dkrn_f = @make_dphi_rbf;
0194     krn_f  = @make_phi_rbf;
0195 otherwise
0196     error('Unknown option');
0197 end;
0198 [N,M]  = size(K{1});
0199 nu     = ones(numel(K),1);
0200 rescal = ones(numel(K),1);
0201 for i=1:numel(K),
0202     if strcmpi(opt,'Gaussian RBF') || strcmpi(opt,'nonlinear') || strcmpi(opt,'nonlin'),
0203         d         = 0.5*diag(K{i});
0204         K{i}      = repmat(d,[1 size(K{i},1)]) + repmat(d',[size(K{i},1),1]) - K{i};
0205         K{i}      = max(K{i},0);
0206         K{i}      = -K{i};
0207         nu(i)     = 1/sqrt(sum(K{i}(:).^2)/(size(K{i},1).^2-size(K{i},1)));
0208     else
0209         rescal(i) = sqrt(size(K{i},1)/sum(K{i}(:).^2));
0210         K{i}      = K{i}.*rescal(i);
0211     end;
0212 end;
0213 
0214 alpha  = [1 1]';
0215 %beta  = 1/sum((t-mean(t)).^2);
0216 beta   = 1e6;
0217 [w,alpha,beta,nu,ll]=rvr1(K,t,alpha,beta,nu,krn_f,dkrn_f);
0218 alpha  = [alpha(1)*ones(N,1) ; alpha(2)];
0219 [w,alpha,beta,nu,ll]=rvr2(K,t,alpha,beta,nu,krn_f,dkrn_f);
0220 nu    = nu.*rescal;
0221 return;
0222 %__________________________________________________________________________
0223 
0224 %__________________________________________________________________________
0225 function [w,alpha,beta,nu,ll] = rvr1(K,t,alpha,beta,nu,krn_f,dkrn_f)
0226 % This function is not actually used
0227 spm_chi2_plot('Init','ML-II (non-sparse)','-Log-likelihood','Iteration');
0228 [N,M]   = size(K{1});
0229 ll      = Inf;
0230 for iter=1:50,
0231     Phi     = feval(krn_f,nu,K);
0232     for subit=1:1,
0233         alpha_old = alpha;
0234         beta_old  = beta;
0235 
0236         % E-step
0237         S         = inv(Phi'*Phi*beta + spdiags([ones(N,1)*alpha(1) ; alpha(2)],0,N+1,N+1));
0238         w         = S*(Phi'*t*beta);
0239 
0240         % M-step
0241         ds        = diag(S);
0242         dfa1      = sum(ds(1:N))*alpha(1);
0243         dfa2      = sum(ds(N+1))*alpha(2);
0244         alpha(1)  = max(N-dfa1,eps)/(sum(w(1:N).^2)   +eps);
0245         alpha(2)  = max(1-dfa2,eps)/(sum(w(N+1).^2)   +eps);
0246         beta      = max(dfa1+dfa2-1,eps)/(sum((Phi*w-t).^2)+eps);
0247 
0248         % Convergence
0249         if max(max(abs(log((alpha+eps)./(alpha_old+eps)))),log(beta/beta_old)) < 1e-9,
0250             break;
0251         end;
0252     end;
0253 
0254 
0255     % Update nu
0256     oll       = ll;
0257     al1       = [ones(N,1)*alpha(1) ; alpha(2)];
0258     [nu,ll]   = re_estimate_nu(K,t,nu,al1,beta,krn_f,dkrn_f);
0259 
0260 %    scale = sqrt(sum(nu.^2));
0261 %    nu    = nu/scale;
0262 %    alpha = alpha/scale^2;
0263 
0264     spm_chi2_plot('Set',ll);
0265     if abs(oll-ll) < 1e-6*N, break; end;
0266 end;
0267 spm_chi2_plot('Clear');
0268 return;
0269 %__________________________________________________________________________
0270 
0271 %__________________________________________________________________________
0272 function [w,alpha,beta,nu,ll]=rvr2(K,t,alpha,beta,nu,krn_f,dkrn_f)
0273 spm_chi2_plot('Init','ML-II (sparse)','-Log-likelihood','Iteration');
0274 [N,M] = size(K{1});
0275 w     = zeros(N+1,1);
0276 ll    = Inf;
0277 for iter=1:100,
0278     for subits=1:1,
0279         % Suboptimal estimates of nu if the alphas and weights are pruned
0280         % too quickly.
0281 
0282         th         = min(alpha)*1e9;
0283         nz         = alpha<th;
0284         alpha(~nz) = th*1e9;
0285 
0286         alpha_old  = alpha;
0287         beta_old   = beta;
0288         Phi        = feval(krn_f,nu,K,nz);
0289 
0290         % E-step
0291         S         = inv(beta*Phi'*Phi + diag(alpha(nz)));
0292         w(nz)     = S*Phi'*t*beta;
0293         w(~nz)    = 0;
0294 
0295         % figure(3); plot(t,Phi*w(nz),'.');drawnow;
0296 
0297         % M-step
0298         gam       = 1 - alpha(nz).*diag(S);
0299         alpha(nz) = max(gam,eps)./(w(nz).^2+1e-32);
0300         beta      = max(N-sum(gam),eps)./(sum((Phi*w(nz)-t).^2)+1e-32);
0301 
0302        % Convergence
0303         if max(max(abs(log((alpha+eps)./(alpha_old+eps)))),log(beta/beta_old)) < 1e-6,
0304             break;
0305         end;
0306     end;
0307 
0308     oll       = ll;
0309     [nu,ll]   = re_estimate_nu(K,t,nu,alpha,beta,krn_f,dkrn_f,nz);
0310 
0311     % scale     = sqrt(sum(nu.^2));
0312     % nu        = nu/scale;
0313     % alpha     = alpha/scale^2;
0314 
0315     spm_chi2_plot('Set',ll);
0316 
0317     % Convergence
0318     if abs(oll-ll) < 1e-9*N,
0319         break;
0320     end;
0321 end;
0322 spm_chi2_plot('Clear');
0323 return;
0324 %__________________________________________________________________________
0325 
0326 %__________________________________________________________________________
0327 function Phi = make_phi(nu,K,nz)
0328 % Dot product matrix, generated from linear combination of dot-product
0329 % matrices.
0330 if nargin>2 && ~isempty(nz),
0331     nz1 = nz(1:size(K{1},1));
0332     nz2 = nz(size(K{1},1)+1);
0333     Phi = K{1}(:,nz1)*nu(1);
0334     for i=2:numel(K),
0335         Phi=Phi+K{i}(:,nz1)*nu(i);
0336     end;
0337     Phi = [Phi ones(size(Phi,1),sum(nz2))];
0338 else
0339     Phi = K{1}*nu(1);
0340     for i=2:numel(K),
0341         Phi=Phi+K{i}*nu(i);
0342     end;
0343     Phi = [Phi ones(size(Phi,1),1)];
0344 end;
0345 return;
0346 %__________________________________________________________________________
0347 
0348 %__________________________________________________________________________
0349 function [dPhi,d2Phi] = make_dphi(nu,K,nz)
0350 % First and second derivatives of Phi with respect to nu, where
0351 % Phi is a dot-product matrix.
0352 dPhi  = cell(size(K));
0353 d2Phi = cell(numel(K));
0354 if nargin>2 && ~isempty(nz),
0355     nz1 = nz(1:size(K{1},1));
0356     nz2 = nz(size(K{1},1)+1);
0357     for i=1:numel(K),
0358         dPhi{i} = [K{i}(:,nz1),zeros(size(K{i},1),sum(nz2))];
0359         dPhi{i} = dPhi{i}*nu(i);
0360     end;
0361 else
0362     for i=1:numel(K),
0363         dPhi{i} = [K{i},zeros(size(K{i},1),1)];
0364         dPhi{i} = dPhi{i}*nu(i);
0365     end;
0366 end;
0367 %z = sparse([],[],[],size(dPhi{1},1),size(dPhi{1},2));
0368 z = zeros(size(dPhi{1}));
0369 for i=1:numel(K),
0370     d2Phi{i,i} = nu(i)*dPhi{i};
0371     for j=(i+1):numel(K),
0372         d2Phi{i,j} = z;
0373         d2Phi{j,i} = z;
0374     end;
0375     dPhi{i} = nu(i)*dPhi{i};
0376 end;
0377 return;
0378 %__________________________________________________________________________
0379 
0380 %__________________________________________________________________________
0381 function Phi = make_phi_rbf(nu,K,nz)
0382 % Radial basis function kernel.
0383 if nargin>2 && ~isempty(nz),
0384     nz1 = nz(1:size(K{1},1));
0385     nz2 = nz(size(K{1},1)+1);
0386     Phi       = K{1}(:,nz1)*nu(1);
0387     for i=2:numel(K),
0388         Phi=Phi+K{i}(:,nz1)*nu(i);
0389     end;
0390     Phi = [exp(Phi) ones(size(Phi,1),sum(nz2))];
0391 else
0392     Phi       = K{1}*nu(1);
0393     for i=2:numel(K),
0394         Phi=Phi+K{i}*nu(i);
0395     end;
0396     Phi = [exp(Phi) ones(size(Phi,1),1)];
0397 end;
0398 return;
0399 %__________________________________________________________________________
0400 
0401 %__________________________________________________________________________
0402 function [dPhi,d2Phi] = make_dphi_rbf(nu,K,nz)
0403 % First and second derivatives of Phi with respect to nu, where
0404 % Phi is defined by radial basis functions.
0405 Phi   = make_phi_rbf(nu,K,nz);
0406 dPhi  = cell(size(K));
0407 d2Phi = cell(numel(K));
0408 if nargin>2 && ~isempty(nz),
0409     nz1 = nz(1:size(K{1},1));
0410     nz2 = nz(size(K{1},1)+1);
0411     for i=1:numel(K),
0412         dPhi{i} = [K{i}(:,nz1),zeros(size(K{i},1),sum(nz2))];
0413     end;
0414 else
0415     for i=1:numel(K),
0416         dPhi{i} = [K{i},zeros(size(K{i},1),1)];
0417     end;
0418 end;
0419 
0420 for i=1:numel(K),
0421     d2Phi{i,i} = nu(i)*dPhi{i}.*Phi.*(1+nu(i)*dPhi{i});
0422     for j=(i+1):numel(K),
0423         d2Phi{i,j} = (nu(i)*nu(j))*dPhi{i}.*dPhi{j}.*Phi;
0424         d2Phi{j,i} = d2Phi{i,j};
0425     end;
0426     dPhi{i} = nu(i)*dPhi{i}.*Phi;
0427 end;
0428 return;
0429 %__________________________________________________________________________
0430 
0431 %__________________________________________________________________________
0432 function [nu,ll] = re_estimate_nu(K,t,nu,alpha,beta,krn_f,dkrn_f,nz)
0433 % See Appendix C of Tipping (2001).  Note that a Levenberg-Marquardt
0434 % approach is used for the optimisation.
0435 if nargin<8, nz = true(size(K{1},2)+1,1); end;
0436 
0437 ll   = Inf;
0438 lam  = 1e-6;
0439 Phi  = feval(krn_f,nu,K,nz);
0440 S    = inv(Phi'*Phi*beta+diag(alpha(nz)));
0441 w    = beta*S*Phi'*t;
0442 N    = size(Phi,1);
0443 ll   = ...
0444      -0.5*sum(log(alpha(nz)))-0.5*N*log(beta)-0.5*logdet(S)...
0445      +0.5*(t-Phi*w)'*(t-Phi*w)*beta + 0.5*sum(w.^2.*alpha(nz))...
0446      +0.5*(sum(nz)-N)*log(2*pi);
0447 
0448 for subit=1:30,
0449 
0450     % Generate 1st and second derivatives of the objective function (ll).
0451     % These are derived from the first and second partial derivatives
0452     % of Phi with respect to nu.
0453     g    = zeros(numel(K),1);
0454     H    = zeros(numel(K));
0455     [dPhi,d2Phi] = feval(dkrn_f,nu,K,nz);
0456     for i=1:numel(K),
0457         tmp1  = Phi'*dPhi{i};
0458         tmp1  = tmp1+tmp1';
0459         g(i)  = 0.5*beta*(sum(sum(S.*tmp1)) + w'*tmp1*w - 2*t'*dPhi{i}*w);
0460         for j=i:numel(K),
0461             tmp    = dPhi{j}'*dPhi{i} + Phi'*d2Phi{i,j};
0462             tmp    = tmp+tmp';
0463             tmp2   = Phi'*dPhi{j};
0464             tmp2   = tmp2+tmp2';
0465             H(i,j) = sum(sum(S.*(tmp - tmp1*S*tmp2))) + w'*tmp*w - 2*w'*d2Phi{i,j}'*t;
0466             H(i,j) = 0.5*beta*H(i,j);
0467             H(j,i) = H(i,j);
0468         end;
0469     end;
0470 
0471     oll  = ll;
0472     onu  = nu;
0473 
0474     % Negative eigenvalues indicate that the solution is tending
0475     % towards a saddle point or a maximum rather than a minimum
0476     % of the negative log-likelihood
0477     lam  = max(lam,-real(min(eig(H)))*1.5);
0478 
0479     for subsubit=1:30,
0480         drawnow;
0481 
0482         % Levenberg-Marquardt update of log(nu)
0483         warning off
0484         nu        = exp(log(nu) - (H+lam*speye(size(H)))\g);
0485         warning on
0486 
0487         % Make sure the values are within a semi-reasonable range
0488         nu        = max(max(nu,1e-12),max(nu)*1e-9);
0489         nu        = min(min(nu,1e12) ,min(nu)*1e9);
0490 
0491         % Re-compute the log-likelihood
0492         Phi1      = feval(krn_f,nu,K,nz);
0493         warning off
0494         S1        = inv(Phi1'*Phi1*beta+diag(alpha(nz))); % Sigma
0495         warning on
0496         w1        = beta*S1*Phi1'*t; % weights
0497         ll        = ...
0498              -0.5*sum(log(alpha(nz)+eps))-0.5*N*log(beta+eps)-0.5*logdet(S1)...
0499              +0.5*(t-Phi1*w1)'*(t-Phi1*w1)*beta + 0.5*sum(w1.^2.*alpha(nz))...
0500              +0.5*(sum(nz)-N)*log(2*pi);
0501 
0502         % Terminate if no difference
0503         if abs(ll-oll)<1e-9,
0504             nu  = onu;
0505             ll  = oll;
0506             break;
0507         end;
0508 
0509         if ll>oll, % Solution gets worse
0510             lam = lam*10; % More regularisation required
0511             nu  = onu; % Discard new estimates of nu and ll
0512             ll  = oll;
0513         else % Solution improves
0514             lam = lam/10; % Try even less regularisation
0515             lam = max(lam,1e-12);
0516 
0517             % Use the new Phi, S and w for recomputing the
0518             % derivatives in the next iteration
0519             Phi = Phi1;
0520             S   = S1;
0521             w   = w1;
0522             break;
0523         end;
0524     end;
0525     if abs(ll-oll)<1e-9*N, break; end;
0526 end;
0527 %__________________________________________________________________________
0528 
0529 %__________________________________________________________________________
0530 function [ld,C] = logdet(A)
0531 A  = (A+A')/2;
0532 C  = chol(A);
0533 d  = max(diag(C),eps);
0534 ld = sum(2*log(d));
0535

Generated on Sun 20-May-2012 13:24:48 by m2html © 2005