Skip to content

Instantly share code, notes, and snippets.

@simonohanlon101
Created February 5, 2013 15:02
Show Gist options
  • Save simonohanlon101/4714981 to your computer and use it in GitHub Desktop.
Save simonohanlon101/4714981 to your computer and use it in GitHub Desktop.
Returns column names of variables with 2nd and 3rd highest correlation with row variable
#==================================
#
# Answer to http://stackoverflow.com/q/14702714/1478381
# Using R to find correlation pairs
# Author: Simon O'Hanlon
# Date: 5th February 2013
#
#==================================
# Construct toy example of symmentrical matrix
# nc is number of rows/columns in matrix, in the problem above it was 4, but let's try with 6
nc <- 6
mat <- diag( 1 , nc )
# Create toy correlation data for matrix
dat <- runif( ( (nc^2-nc)/2 ) )
# Fill both triangles of matrix so it is symmetric
mat[lower.tri( mat ) ] <- dat
mat[upper.tri( mat ) ] <- dat
# Create vector of random string names for row/column names
names <- replicate( nc , expr = paste( sample( c( letters , LETTERS ) , 3 , replace = TRUE ) , collapse = "" ) )
dimnames(mat) <- list( names , names )
# Sanity check
mat
# Ok - to problem at hand , you can just substitute your matrix into these lines:
# Clearly the diagonal in a correlation matrix will be 1 so this is excluded as per your problem
diag( mat ) <- NA
# Now find the next highest correlation in each row and set this to NA
mat <- t( apply( mat , 1 , function(x) { x[ which.max(x) ] <- NA ; return(x) } ) )
# Another sanity check...!
mat
# Now return the two remaining columns with greatest correlation in that row
res <- t( apply( mat , 1 , function(x) { y <- names( sort(x , TRUE ) )[1:2] ; return( y ) } ) )
# Check the result
res
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment