Title: | Implementation of Cluster-Polarization Coefficient |
---|---|
Description: | Implements cluster-polarization coefficient for measuring distributional polarization in single or multiple dimensions, as well as associated functions. Contains support for hierarchical clustering, k-means, partitioning around medoids, density-based spatial clustering with noise, and manually imposed cluster membership. Mehlhaff (forthcoming) <doi:10.1017/S0003055423001041>. |
Authors: | Isaac Mehlhaff [aut, cre, cph]
|
Maintainer: | Isaac Mehlhaff <[email protected]> |
License: | CC0 |
Version: | 2.6.0 |
Built: | 2025-02-25 03:14:36 UTC |
Source: | https://github.com/imehlhaff/cpc |
Calculates correlation coefficient between two variables and returns a list containing the correlation estimate, its standard error, the p-value of a null-hypothesis significance test, and the number of observations used.
correlate(x, y, ...)
correlate(x, y, ...)
x |
a numeric vector. |
y |
a numeric vector. |
... |
arguments passed to |
Additional arguments to alter the type of null hypothesis significance test, the method used to
calculate the correlation coefficient, the confidence level, or other options should be passed to
correlate
() and will be inherited by cor.test()
. Note that unlike
cor.test()
, both arguments x
and y
are required.
Returns a list with elements containing the correlation coefficient estimate, its associated standard error, the p-value of a null-hypothesis significance test, and the number of observations used, all as numeric vectors of length 1.
data <- matrix(c(rnorm(50, 0, 1), rnorm(50, 5, 1)), ncol = 2, byrow = TRUE) correlate(data[, 1], data[, 2])
data <- matrix(c(rnorm(50, 0, 1), rnorm(50, 5, 1)), ncol = 2, byrow = TRUE) correlate(data[, 1], data[, 2])
Implements clustering algorithms and calculates cluster-polarization coefficient. Contains support for hierarchical clustering, k-means clustering, partitioning around medoids, density-based spatial clustering with noise, and manual assignment of cluster membership.
CPC( data, type, k = NULL, epsilon = NULL, model = FALSE, adjust = FALSE, cols = NULL, clusters = NULL, ... )
CPC( data, type, k = NULL, epsilon = NULL, model = FALSE, adjust = FALSE, cols = NULL, clusters = NULL, ... )
data |
a numeric vector or |
type |
a character string giving the type of clustering method to be used. See Details. |
k |
the desired number of clusters. Required if |
epsilon |
radius of epsilon neighborhood. Required if |
model |
a logical indicating whether clustering model output should be
returned. Defaults to |
adjust |
a logical indicating whether the adjusted CPC should be calculated.
Defaults to |
cols |
columns of |
clusters |
column of |
... |
arguments passed to other functions. |
type
must take one of six values: "hclust"
: agglomerative hierarchical clustering with hclust()
, "diana"
: divisive hierarchical clustering with diana()
, "kmeans"
: k-means clustering with kmeans()
, "pam"
: k-medoids clustering with pam()
, "dbscan"
: density-based clustering with dbscan()
, "manual"
: no clustering is necessary, researcher has specified cluster assignments.
For all clustering methods, additional arguments to fine-tune clustering
performance, such as the specific algorithm to be used, should be passed to
CPC()
and will be inherited by the specified clustering function. In
particular, if type = "kmeans"
, using a large number of random starts is
recommended. This can be specified with the nstart
argument to
kmeans()
, passed directly to CPC()
.
If type = "manual"
, data
must contain a vector identifying cluster
membership for each observation, and cols
and clusters
must be
defined.
If model = TRUE
, CPC()
returns a list with components
containing output from the specified clustering function, all sums of squares, the
CPC, the adjusted CPC, and associated standard errors. If model = FALSE
, CPC()
returns
a numeric vector of length 1 giving the CPC (if adjust = FALSE
) or adjusted CPC (if
adjust = TRUE
).
data <- matrix(c(rnorm(50, 0, 1), rnorm(50, 5, 1)), ncol = 2, byrow = TRUE) clusters <- matrix(c(rep(1, 25), rep(2, 25)), ncol = 1) data <- cbind(data, clusters) CPC(data[,c(1:2)], "kmeans", k = 2) CPC(data, "manual", cols = 1:2, clusters = 3)
data <- matrix(c(rnorm(50, 0, 1), rnorm(50, 5, 1)), ncol = 2, byrow = TRUE) clusters <- matrix(c(rep(1, 25), rep(2, 25)), ncol = 1) data <- cbind(data, clusters) CPC(data[,c(1:2)], "kmeans", k = 2) CPC(data, "manual", cols = 1:2, clusters = 3)
Converts numeric matrix to data frame with necessary format for
"manual"
CPC()
calculation.
CPCdata.frame(data, cols, clusters)
CPCdata.frame(data, cols, clusters)
data |
a numeric |
cols |
columns in |
clusters |
column in |
Returns a data frame with dimensions identical to those of data
.
data <- matrix(c(rnorm(50, 0, 1), rnorm(50, 5, 1)), ncol = 2, byrow = TRUE) clusters <- matrix(c(rep(1, 25), rep(2, 25)), ncol = 1) data <- cbind(data, clusters) CPCdata.frame(data, 1:2, 3)
data <- matrix(c(rnorm(50, 0, 1), rnorm(50, 5, 1)), ncol = 2, byrow = TRUE) clusters <- matrix(c(rep(1, 25), rep(2, 25)), ncol = 1) data <- cbind(data, clusters) CPCdata.frame(data, 1:2, 3)
Calculates average Euclidean distance between means in arbitrary dimensions.
diff_multidim(data, cols, clusters)
diff_multidim(data, cols, clusters)
data |
a numeric vector or |
cols |
columns of |
clusters |
column of |
Returns a numeric vector of length 1.
data <- matrix(c(rnorm(50, 0, 1), rnorm(50, 5, 1)), ncol = 2, byrow = TRUE) clusters <- matrix(c(rep(1, 25), rep(2, 25)), ncol = 1) data <- cbind(data, clusters) diff_multidim(data, 1:2, 3)
data <- matrix(c(rnorm(50, 0, 1), rnorm(50, 5, 1)), ncol = 2, byrow = TRUE) clusters <- matrix(c(rep(1, 25), rep(2, 25)), ncol = 1) data <- cbind(data, clusters) diff_multidim(data, 1:2, 3)
Calculates two-dimensional Euclidean distance between all points and dimension means.
Euclidean(data)
Euclidean(data)
data |
an |
Returns a numeric vector of length 1.
data <- matrix(c(rnorm(50, 0, 1), rnorm(50, 5, 1)), ncol = 2, byrow = TRUE) Euclidean(data)
data <- matrix(c(rnorm(50, 0, 1), rnorm(50, 5, 1)), ncol = 2, byrow = TRUE) Euclidean(data)
Calculates sums of squares for uni- or multi-dimensional numeric data using the distance matrix.
SS(data, ...)
SS(data, ...)
data |
a numeric vector or |
... |
arguments passed to |
Returns a numeric vector of length 1.
data <- matrix(c(rnorm(50, 0, 1), rnorm(50, 5, 1)), ncol = 2, byrow = TRUE) SS(data)
data <- matrix(c(rnorm(50, 0, 1), rnorm(50, 5, 1)), ncol = 2, byrow = TRUE) SS(data)