Category: Genomics

How to use the PAC measure in consensus clustering?

12/6/2015

Every now and then, I get emails asking for the implementation of the PAC (proportion of ambiguous clustering) in consensus clustering.

The truth is, this code snippet should really have been included in the publication in the first place. So, apologies for not having done that. But here is a sample code below to implement PAC after you have obtained the consensus matrices (M). This sample code uses the ConsensusClusterPlus* package in R to obtain consensus matrices.
*Wilkerson M and Waltman P (2013). ConsensusClusterPlus: ConsensusClusterPlus. R package version 1.24.0.
########################################################
library(ConsensusClusterPlus)
seed=11111
d = matrix(rnorm(200000,0,1),ncol=200) # 200 samples in columns, 1000 genes in rows
colnames(d) = paste("Samp",1:200,sep=“")
rownames(d) = paste("Gene",1:1000,sep=“")
d = sweep(d,1, apply(d,1,median,na.rm=T))
maxK = 6 # maximum number of clusters to try
results = ConsensusClusterPlus(d,maxK=maxK,reps=50,pItem=0.8,pFeature=1,title="test_run",
innerLinkage="complete",seed=seed,plot="pdf")

# Note that we implement consensus clustering with innerLinkage="complete". We advise against using innerLinkage="average" which is the default value in this package as average linkage is not robust to outliers.

############## PAC implementation ##############
Kvec = 2:maxK
x1 = 0.1; x2 = 0.9 # threshold defining the intermediate sub-interval
PAC = rep(NA,length(Kvec))
names(PAC) = paste("K=",Kvec,sep=“") # from 2 to maxK

for(i in Kvec){
M = results[[i]]$consensusMatrix
Fn = ecdf(M[lower.tri(M)])
PAC[i-1] = Fn(x2) - Fn(x1)
}#end for i

# The optimal K
optK = Kvec[which.min(PAC)]
########################################################

1 Comment

Blog

How to use the PAC measure in consensus clustering?

Archives

Categories