Every now and then, I get emails asking for the implementation of the PAC (proportion of ambiguous clustering) in consensus clustering.
The truth is, this code snippet should really have been included in the publication in the first place. So, apologies for not having done that. But here is a sample code below to implement PAC after you have obtained the consensus matrices (M). This sample code uses the ConsensusClusterPlus* package in R to obtain consensus matrices. *Wilkerson M and Waltman P (2013). ConsensusClusterPlus: ConsensusClusterPlus. R package version 1.24.0. ######################################################## library(ConsensusClusterPlus) seed=11111 d = matrix(rnorm(200000,0,1),ncol=200) # 200 samples in columns, 1000 genes in rows colnames(d) = paste("Samp",1:200,sep=“") rownames(d) = paste("Gene",1:1000,sep=“") d = sweep(d,1, apply(d,1,median,na.rm=T)) maxK = 6 # maximum number of clusters to try results = ConsensusClusterPlus(d,maxK=maxK,reps=50,pItem=0.8,pFeature=1,title="test_run", innerLinkage="complete",seed=seed,plot="pdf") # Note that we implement consensus clustering with innerLinkage="complete". We advise against using innerLinkage="average" which is the default value in this package as average linkage is not robust to outliers. ############## PAC implementation ############## Kvec = 2:maxK x1 = 0.1; x2 = 0.9 # threshold defining the intermediate sub-interval PAC = rep(NA,length(Kvec)) names(PAC) = paste("K=",Kvec,sep=“") # from 2 to maxK for(i in Kvec){ M = results[[i]]$consensusMatrix Fn = ecdf(M[lower.tri(M)]) PAC[i-1] = Fn(x2) - Fn(x1) }#end for i # The optimal K optK = Kvec[which.min(PAC)] ########################################################
1 Comment
|
Archives
July 2017
Categories |