Package 'CPC' reference manual

Title:	Implementation of Cluster-Polarization Coefficient
Description:	Implements cluster-polarization coefficient for measuring distributional polarization in single or multiple dimensions, as well as associated functions. Contains support for hierarchical clustering, k-means, partitioning around medoids, density-based spatial clustering with noise, and manually imposed cluster membership. Mehlhaff (forthcoming) <doi:10.1017/S0003055423001041>.
Authors:	Isaac Mehlhaff [aut, cre, cph]
Maintainer:	Isaac Mehlhaff <[email protected]>
License:	CC0
Version:	2.6.0
Built:	2025-02-25 03:14:36 UTC
Source:	https://github.com/imehlhaff/cpc

Test for Bivariate Correlation

Description

Calculates correlation coefficient between two variables and returns a list containing the correlation estimate, its standard error, the p-value of a null-hypothesis significance test, and the number of observations used.

Usage

correlate(x, y, ...)
correlate(x, y, ...)

Arguments

`x`	a numeric vector.
`y`	a numeric vector.
`...`	arguments passed to `cor.test()`.

Details

Additional arguments to alter the type of null hypothesis significance test, the method used to calculate the correlation coefficient, the confidence level, or other options should be passed to correlate() and will be inherited by cor.test(). Note that unlike cor.test(), both arguments x and y are required.

Value

Returns a list with elements containing the correlation coefficient estimate, its associated standard error, the p-value of a null-hypothesis significance test, and the number of observations used, all as numeric vectors of length 1.

Examples

data <- matrix(c(rnorm(50, 0, 1), rnorm(50, 5, 1)), ncol = 2, byrow = TRUE)

correlate(data[, 1], data[, 2])

data <- matrix(c(rnorm(50, 0, 1), rnorm(50, 5, 1)), ncol = 2, byrow = TRUE)

correlate(data[, 1], data[, 2])

Cluster-Polarization Coefficient

Description

Implements clustering algorithms and calculates cluster-polarization coefficient. Contains support for hierarchical clustering, k-means clustering, partitioning around medoids, density-based spatial clustering with noise, and manual assignment of cluster membership.

Usage

CPC(
  data,
  type,
  k = NULL,
  epsilon = NULL,
  model = FALSE,
  adjust = FALSE,
  cols = NULL,
  clusters = NULL,
  ...
)
CPC(
  data,
  type,
  k = NULL,
  epsilon = NULL,
  model = FALSE,
  adjust = FALSE,
  cols = NULL,
  clusters = NULL,
  ...
)

Arguments

`data`	a numeric vector or `n x k` matrix or data frame. If `type = "manual"`, `data` must be a matrix containing a vector identifying cluster membership for each observation, to be passed to `clusters` argument.
`type`	a character string giving the type of clustering method to be used. See Details.
`k`	the desired number of clusters. Required if `type` is one of `"hclust"`, `"diana"`, `"kmeans"`, or `"pam"`.
`epsilon`	radius of epsilon neighborhood. Required if `type = "dbscan"`.
`model`	a logical indicating whether clustering model output should be returned. Defaults to `FALSE`.
`adjust`	a logical indicating whether the adjusted CPC should be calculated. Defaults to `FALSE`. Note that both CPC and adjusted CPC are automatically calculated and returned if `model = TRUE`.
`cols`	columns of `data` to be used in CPC calculation. Only used if `type = "manual"`.
`clusters`	column of `data` indicating cluster membership for each observation. Only used if `type = "manual"`.
`...`	arguments passed to other functions.

Details

type must take one of six values:
"hclust": agglomerative hierarchical clustering with hclust(),
"diana": divisive hierarchical clustering with diana(),
"kmeans": k-means clustering with kmeans(),
"pam": k-medoids clustering with pam(),
"dbscan": density-based clustering with dbscan(),
"manual": no clustering is necessary, researcher has specified cluster assignments.

For all clustering methods, additional arguments to fine-tune clustering performance, such as the specific algorithm to be used, should be passed to CPC() and will be inherited by the specified clustering function. In particular, if type = "kmeans", using a large number of random starts is recommended. This can be specified with the nstart argument to kmeans(), passed directly to CPC().

If type = "manual", data must contain a vector identifying cluster membership for each observation, and cols and clusters must be defined.

Value

If model = TRUE, CPC() returns a list with components containing output from the specified clustering function, all sums of squares, the CPC, the adjusted CPC, and associated standard errors. If model = FALSE, CPC() returns a numeric vector of length 1 giving the CPC (if adjust = FALSE) or adjusted CPC (if adjust = TRUE).

Examples

data <- matrix(c(rnorm(50, 0, 1), rnorm(50, 5, 1)), ncol = 2, byrow = TRUE)
clusters <- matrix(c(rep(1, 25), rep(2, 25)), ncol = 1)
data <- cbind(data, clusters)

CPC(data[,c(1:2)], "kmeans", k = 2)
CPC(data, "manual", cols = 1:2, clusters = 3)

data <- matrix(c(rnorm(50, 0, 1), rnorm(50, 5, 1)), ncol = 2, byrow = TRUE)
clusters <- matrix(c(rep(1, 25), rep(2, 25)), ncol = 1)
data <- cbind(data, clusters)

CPC(data[,c(1:2)], "kmeans", k = 2)
CPC(data, "manual", cols = 1:2, clusters = 3)

Data Manipulation for CPC Calculation

Description

Converts numeric matrix to data frame with necessary format for "manual" CPC() calculation.

Usage

CPCdata.frame(data, cols, clusters)
CPCdata.frame(data, cols, clusters)

Arguments

`data`	a numeric `n x k` matrix or data frame.
`cols`	columns in `data` to be used for calculating `CPC()`.
`clusters`	column in `data` giving cluster membership.

Value

Returns a data frame with dimensions identical to those of data.

Examples

data <- matrix(c(rnorm(50, 0, 1), rnorm(50, 5, 1)), ncol = 2, byrow = TRUE)
clusters <- matrix(c(rep(1, 25), rep(2, 25)), ncol = 1)
data <- cbind(data, clusters)
CPCdata.frame(data, 1:2, 3)

data <- matrix(c(rnorm(50, 0, 1), rnorm(50, 5, 1)), ncol = 2, byrow = TRUE)
clusters <- matrix(c(rep(1, 25), rep(2, 25)), ncol = 1)
data <- cbind(data, clusters)
CPCdata.frame(data, 1:2, 3)

Multidimensional Difference-in-Means

Description

Calculates average Euclidean distance between means in arbitrary dimensions.

Usage

diff_multidim(data, cols, clusters)
diff_multidim(data, cols, clusters)

Arguments

`data`	a numeric vector or `n x k` matrix or data frame containing a vector identifying cluster membership for each observation, to be passed to `clusters` argument.
`cols`	columns of `data` to be used in difference-in-means calculation.
`clusters`	column of `data` indicating cluster membership for each observation.

Value

Returns a numeric vector of length 1.

Examples

data <- matrix(c(rnorm(50, 0, 1), rnorm(50, 5, 1)), ncol = 2, byrow = TRUE)
clusters <- matrix(c(rep(1, 25), rep(2, 25)), ncol = 1)
data <- cbind(data, clusters)

diff_multidim(data, 1:2, 3)

data <- matrix(c(rnorm(50, 0, 1), rnorm(50, 5, 1)), ncol = 2, byrow = TRUE)
clusters <- matrix(c(rep(1, 25), rep(2, 25)), ncol = 1)
data <- cbind(data, clusters)

diff_multidim(data, 1:2, 3)

Euclidean Distance from Dimension Means

Description

Calculates two-dimensional Euclidean distance between all points and dimension means.

Usage

Euclidean(data)
Euclidean(data)

Arguments

data

an n x 2 matrix or data frame.

Value

Returns a numeric vector of length 1.

Examples

data <- matrix(c(rnorm(50, 0, 1), rnorm(50, 5, 1)), ncol = 2, byrow = TRUE)

Euclidean(data)

data <- matrix(c(rnorm(50, 0, 1), rnorm(50, 5, 1)), ncol = 2, byrow = TRUE)

Euclidean(data)

Sum-of-Squares Calculation

Description

Calculates sums of squares for uni- or multi-dimensional numeric data using the distance matrix.

Usage

SS(data, ...)
SS(data, ...)

Arguments

`data`	a numeric vector or `n x k` matrix or data frame.
`...`	arguments passed to `dist()`.

Value

Returns a numeric vector of length 1.

Examples

data <- matrix(c(rnorm(50, 0, 1), rnorm(50, 5, 1)), ncol = 2, byrow = TRUE)
SS(data)

data <- matrix(c(rnorm(50, 0, 1), rnorm(50, 5, 1)), ncol = 2, byrow = TRUE)
SS(data)

Package 'CPC'

Help Index

Test for Bivariate Correlation

Description

Usage

Arguments

Details

Value

Examples

Cluster-Polarization Coefficient

Description

Usage

Arguments

Details

Value

Examples

Data Manipulation for CPC Calculation

Description

Usage

Arguments

Value

Examples

Multidimensional Difference-in-Means

Description

Usage

Arguments

Value

Examples

Euclidean Distance from Dimension Means

Description

Usage

Arguments

Value

Examples

Sum-of-Squares Calculation

Description

Usage

Arguments

Value

Examples