K-groups: a generalization of K-means clustering

(12 Nov 2017)


We propose a new class of distribution-based clustering algorithms, called k-groups, based on energy distance between samples. The energy distance clustering criterion assigns observations to clusters according to a multi-sample energy statistic that measures the distance between distributions. The energy distance determines a consistent test for equality of distributions, and it is based on a population distance that characterizes equality of distributions. The k-groups procedure therefore generalizes the k-means method, which separates clusters that have different means. We propose two k-groups algorithms: k-groups by first variation; and k-groups by ...


High resolution global gridded data for use in population studies

Scientific Data, Vol. 4 (31 January 2017), 170001,


Recent years have seen substantial growth in openly available satellite and other geospatial data layers, which represent a range of metrics relevant to global human population mapping at fine spatial scales. The specifications of such data differ widely and therefore the harmonisation of data layers is a prerequisite to constructing detailed and contemporary spatial datasets which accurately describe population distributions. Such datasets are vital to measure impacts of population growth, monitor change, and plan interventions. To this end the WorldPop Project ...


2D Euclidean distance transform algorithms: a comparative survey

ACM Computing Surveys, Vol. 40, No. 1. (February 2008), pp. 1-44,


The distance transform (DT) is a general operator forming the basis of many methods in computer vision and geometry, with great potential for practical applications. However, all the optimal algorithms for the computation of the exact Euclidean DT (EDT) were proposed only since the 1990s. In this work, state-of-the-art sequential 2D EDT algorithms are reviewed and compared, in an effort to reach more solid conclusions regarding their differences in speed and their exactness. Six of the best algorithms were fully implemented ...


A general algorithm for computing distance transforms in linear time

Mathematical Morphology and its Applications to Image and Signal Processing In Mathematical Morphology and its Applications to Image and Signal Processing, Vol. 18 (2000), pp. 331-340,


A new general algorithm for computing distance transforms of digital images is presented. The algorithm consists of two phases. Both phases consist of two scans, a forward and a backward scan. The first phase scans the image column-wise, while the second phase scans the image row-wise. Since the computation per row (column) is independent of the computation of other rows (columns), the algorithm can be easily parallelized on shared memory computers. The algorithm can be used for the computation of the ...


An evaluation of void-filling interpolation methods for SRTM data

International Journal of Geographical Information Science, Vol. 21, No. 9. (1 October 2007), pp. 983-1008,


The Digital Elevation Model that has been derived from the February 2000 Shuttle Radar Topography Mission (SRTM) has been one of the most important publicly available new spatial data sets in recent years. However, the ‘finished’ grade version of the data (also referred to as Version 2) still contains data voids (some 836,000 km2)—and other anomalies—that prevent immediate use in many applications. These voids can be filled using a range of interpolation algorithms in conjunction with other sources of elevation data, ...


Novel three-step pseudo-absence selection technique for improved species distribution modelling

PLOS ONE, Vol. 8, No. 8. (13 August 2013), e71218,


Pseudo-absence selection for spatial distribution models (SDMs) is the subject of ongoing investigation. Numerous techniques continue to be developed, and reports of their effectiveness vary. Because the quality of presence and absence data is key for acceptable accuracy of correlative SDM predictions, determining an appropriate method to characterise pseudo-absences for SDM’s is vital. The main methods that are currently used to generate pseudo-absence points are: 1) randomly generated pseudo-absence locations from background data; 2) pseudo-absence locations generated within a delimited geographical ...


Equality in maternal and newborn health: modelling geographic disparities in utilisation of care in five East African countries

PLoS ONE, Vol. 11, No. 8. (25 August 2016), e0162006,


Geographic accessibility to health facilities represents a fundamental barrier to utilisation of maternal and newborn health (MNH) services, driving historically hidden spatial pockets of localized inequalities. Here, we examine utilisation of MNH care as an emergent property of accessibility, highlighting high-resolution spatial heterogeneity and sub-national inequalities in receiving care before, during, and after delivery throughout five East African countries. We calculated a geographic inaccessibility score to the nearest health facility at 300 x 300 m using a dataset of 9,314 facilities ...


Travel time to major cities: a global map of accessibility



[Excerpt: Background] The world is shrinking. Cheap flights, large scale commercial shipping and expanding road networks all mean that we are better connected to everywhere else than ever before. But global travel and international trade and just two of the forces that have reshaped our world. A new map of Travel Time to Major Cities - developed by the European Commission and the World Bank - captures this connectivity and the concentration of economic activity and also highlights that there is little ...


Modeling potential distribution and carbon dynamics of natural terrestrial ecosystems: a case study of Turkey

Sensors, Vol. 7, No. 10. (11 October 2007), pp. 2273-2296,


We derived a simple model that relates the classification of biogeoclimatezones, (co)existence and fractional coverage of plant functional types (PFTs), and patternsof ecosystem carbon (C) stocks to long-term average values of biogeoclimatic indices in atime- and space-varying fashion from climate–vegetation equilibrium models. ProposedDynamic Ecosystem Classification and Productivity (DECP) model is based on the spatialinterpolation of annual biogeoclimatic variables through multiple linear regression (MLR)models and inverse distance weighting (IDW) and was applied to the entire Turkey of780,595 km2 on a 500 m ...


Seed dispersal distances: a typology based on dispersal modes and plant traits

Botanica Helvetica In Botanica Helvetica, Vol. 117, No. 2. (7 December 2007), pp. 109-124,


The ability of plants to disperse seeds may be critical for their survival under the current constraints of landscape fragmentation and climate change. Seed dispersal distance would therefore be an important variable to include in species distribution models. Unfortunately, data on dispersal distances are scarce, and seed dispersal models only exist for some species with particular dispersal modes. To overcome this lack of knowledge, we propose a simple approach to estimate seed dispersal distances for a whole regional flora. We reviewed ...


Is the C-terminal insertional signal in Gram-negative bacterial outer membrane proteins species-specific or not?

BMC Genomics, Vol. 13, No. 1. (26 September 2012), 510,


BACKGROUND:In Gram-negative bacteria, the outer membrane is composed of an asymmetric lipid bilayer of phopspholipids and lipopolysaccharides, and the transmembrane proteins that reside in this membrane are almost exclusively beta-barrel proteins. These proteins are inserted into the membrane by a highly conserved and essential machinery, the BAM complex. It recognizes its substrates, unfolded outer membrane proteins (OMPs), through a C-terminal motif that has been speculated to be species-specific, based on theoretical and experimental results from only two species, Escherichia coli and ...


Clustering by fast search and find of density peaks

Science, Vol. 344, No. 6191. (26 June 2014), pp. 1492-1496,


[Abstract] Cluster analysis is aimed at classifying elements into categories on the basis of their similarity. Its applications range from astronomy to bioinformatics, bibliometrics, and pattern recognition. We propose an approach based on the idea that cluster centers are characterized by a higher density than their neighbors and by a relatively large distance from points with higher densities. This idea forms the basis of a clustering procedure in which the number of clusters arises intuitively, outliers are automatically spotted and excluded ...


GRASS GIS manual: r.neighbors

In GRASS Development Team, 2013: GRASS GIS 6.4.3svn Reference Manual (2010)


r.neighbors - Makes each cell category value a function of the category values assigned to the cells around it, and stores new cell values in an output raster map layer. ...


GRASS GIS manual: r.drain

In GRASS Development Team, 2013. GRASS GIS 6.4.3svn Reference Manual (2004)


r.drain - Traces a flow through an elevation model on a raster map. ...


GRASS GIS manual: r.cost

In GRASS Development Team, 2013. GRASS GIS 6.4.3svn Reference Manual (2011)


r.cost - Creates a raster map showing the cumulative cost of moving between different geographic locations on an input raster map whose cell category values represent cost. ...

