Selection: with tag distance-correlation [14 articles] 


Kernel-based measures of association

Wiley Interdisciplinary Reviews: Computational Statistics, Vol. 10, No. 2. (March 2018), e1422,


Measures of association have been widely used for describing statistical relationships between two sets of variables. Traditionally, such association measures focus on specialized settings. Based on an in‐depth summary of existing common measures, we present a general framework for association measures that unifies existing methods and novel extensions based on kernels, including practical solutions to computational challenges. Specifically, we introduce association screening and variable selection via maximizing kernel‐based association measures. We also develop a backward dropping procedure for feature selection when ...


K-groups: a generalization of K-means clustering

(12 Nov 2017)


We propose a new class of distribution-based clustering algorithms, called k-groups, based on energy distance between samples. The energy distance clustering criterion assigns observations to clusters according to a multi-sample energy statistic that measures the distance between distributions. The energy distance determines a consistent test for equality of distributions, and it is based on a population distance that characterizes equality of distributions. The k-groups procedure therefore generalizes the k-means method, which separates clusters that have different means. We propose two k-groups algorithms: k-groups by first variation; and k-groups by ...


Partial distance correlation with methods for dissimilarities

The Annals of Statistics, Vol. 42, No. 6. (December 2014), pp. 2382-2412,


Distance covariance and distance correlation are scalar coefficients that characterize independence of random vectors in arbitrary dimension. Properties, extensions and applications of distance correlation have been discussed in the recent literature, but the problem of defining the partial distance correlation has remained an open question of considerable interest. The problem of partial distance correlation is more complex than partial correlation partly because the squared distance covariance is not an inner product in the usual linear space. For the definition of partial ...


Energy distance

Wiley Interdisciplinary Reviews: Computational Statistics, Vol. 8, No. 1. (January 2016), pp. 27-38,


Energy distance is a metric that measures the distance between the distributions of random vectors. Energy distance is zero if and only if the distributions are identical, thus it characterizes equality of distributions and provides a theoretical foundation for statistical inference and analysis. Energy statistics are functions of distances between observations in metric spaces. As a statistic, energy distance can be applied to measure the difference between a sample and a hypothesized distribution or the difference between two or more samples ...


Fast computing for distance covariance

Technometrics (25 June 2015), pp. 0-0,


Distance covariance and distance correlation have been widely adopted in measuring dependence of a pair of random variables or random vectors. If the computation of distance covariance and distance correlation is implemented directly accordingly to its definition then its computational complexity is O(n2) which is a disadvantage compared to other faster methods. In this paper we show that the computation of distance covariance and distance correlation of real valued random variables can be implemented by an O(n log n) algorithm and ...


Discussion of: Brownian distance covariance

The Annals of Applied Statistics, Vol. 3, No. 4. (5 December 2009), pp. 1295-1298,


Discussion on "Brownian distance covariance" by Gábor J. Székely and Maria L. Rizzo [<a href="/abs/1010.0297">arXiv:1010.0297</a>] ...


Measuring and testing dependence by correlation of distances

The Annals of Statistics, Vol. 35, No. 6. (28 December 2007), pp. 2769-2794,


Distance correlation is a new measure of dependence between random vectors. Distance covariance and distance correlation are analogous to product-moment covariance and correlation, but unlike the classical definition of correlation, distance correlation is zero only if the random vectors are independent. The empirical distance dependence measures are based on certain Euclidean distances between sample elements rather than sample moments, yet have a compact representation analogous to the classical covariance and correlation. Asymptotic properties and applications in testing independence are discussed. Implementation of the test and Monte Carlo results are ...


Brownian distance covariance

The Annals of Applied Statistics, Vol. 3, No. 4. (6 Oct 2010), pp. 1236-1265,


Distance correlation is a new class of multivariate dependence coefficients applicable to random vectors of arbitrary and not necessarily equal dimension. Distance covariance and distance correlation are analogous to product-moment covariance and correlation, but generalize and extend these classical bivariate measures of dependence. Distance correlation characterizes independence: it is zero if and only if the random vectors are independent. The notion of covariance with respect to a stochastic process is introduced, and it is shown that population distance covariance coincides with the covariance with respect to Brownian motion; thus, ...


Rejoinder: brownian distance covariance

The Annals of Applied Statistics, Vol. 3, No. 4. (5 Oct 2010), pp. 1303-1308,


Rejoinder to "Brownian distance covariance" by Gábor J. Székely and Maria L. Rizzo [arXiv:1010.0297] ...


Supplementary materials for: a proposal for an integrated modelling framework to characterise habitat pattern



In Estreguil et al. (Environ Modell Softw 52, 176-191, 2014), an integrated modelling framework is proposed to characterise habitat pattern. The modelling approach is there exemplified by deriving a set of twelve indices aggregated into four categories: general landscape composition, habitat morphology, edge interface and connectivity. The easy and reproducible computability is ensured with the integrated use of publicly available software (GUIDOS free-download software, Conefor free software) and of newly programmed tools. A statistical analysis is then conducted using classical linear ...


A proposal for an integrated modelling framework to characterise habitat pattern

Environmental Modelling & Software, Vol. 52 (February 2014), pp. 176-191,


[Highlights] [::] Habitat pattern characterisation as methodological guidance for fragmentation assessments (applied in Europe). [::] Reproducible integration of three landscape models with GIS and semantic array programming. [::] Four families indices: landscape composition, edge interface, habitat morphology and connectivity. [::] New indices: edge interface context of morphological shapes; Power Weighted Probability of Dispersal family for connectivity. [::] Nonlinear statistical correlation analysis based on Brownian Distance Correlation. [Abstract] Harmonized information on habitat pattern, fragmentation and connectivity is one among the reporting needs of the biodiversity policy agenda. This paper ...


Detecting general multi-dimensional nonlinear correlations: the module "dist_corr" of the Mastrave modelling library

In Semantic Array Programming with Mastrave - Introduction to Semantic Computational Modelling (2012),


Linear correlation analysis of complex nonlinear physical or computationally derived quantities - despite straightforward to be implemented with the help of basic numerical tools - may be far sub-optimal in assessing the actual strength of existing relationships between quantities. Moreover, in many applications not only the correlation between pairs of quantities is of interest, but also the more general correlation between a certain group of quantities and another one. Multi-dimensional nonlinear correlation analysis may offer elegant and concise ways of exploring ...


Finding correlations in big data

Nature Biotechnology, Vol. 30, No. 4. (10 April 2012), pp. 334-335,


In today's era of large data sets, statistical methods that facilitate exploratory analyses to detect patterns and generate hypotheses are critical to progress in biology. Last year, David Reshef and colleagues published a new approach to such analysis, called maximal information criteria or MIC (Science 334, 1518–1524, 2011). Nature Biotechnology solicited comments from several practitioners versed in data-intensive biological research. Their responses not only highlight the appeal of methods like MIC for biological research, but also raise some important reservations as ...

