Skip to contents

matrix_rank() estimates the rank of a given data matrix D by counting the number of "practically nonzero" singular values of D.

The rank of a matrix is the number of linearly independent columns or rows in the matrix, governing the structure of the data. It can intuitively be thought of as the number of inherent latent patterns in the data.

A singular value \(s\) is determined to be "practically nonzero" if \(s \geq s_{max} \cdot thresh\), i.e. if it is greater than or equal to the maximum singular value in D scaled by a given threshold thresh.

Usage

matrix_rank(D, thresh = NULL)

Arguments

D

The input data matrix (cannot have NA values).

thresh

(Optional) A double \(> 0\), specifying the relative threshold by which "practically zero" is determined, used to calculate the rank of D. By default, thresh = NULL, in which case the threshold is set to max(dim(D)) * .Machine$double.eps.

Value

An integer estimating the rank of D.

See also

Examples

data <- sim_data()
matrix_rank(data$D)
#> [1] 10
matrix_rank(data$L)
#> [1] 3