A data-driven approach to assess large fire size generation in Greece

Natural Hazards, Vol. 88, No. 3. (2017), pp. 1591-1607, https://doi.org/10.1007/s11069-017-2934-z

Abstract

Identifying factors and drivers which control large fire size generation is critical for planning fire management activities. This study attempts to determine the role of fire suppression tactics and behavior, weather, topography and landscape features on two different datasets of large fire size (500–1000 ha) and very large fire size (>1000 ha) compared to two datasets of small fire size (<50 ha) which occurred in Greece, during the period 1984–2009. In this context, we used a logistic regression (LR) analysis and ...

Statistical modeling: the two cultures (with comments and a rejoinder by the author)

Statistical Science, Vol. 16, No. 3. (August 2001), pp. 199-231, https://doi.org/10.1214/ss/1009213726

Abstract

There are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown. The statistical community has been committed to the almost exclusive use of data models. This commitment has led to irrelevant theory, questionable conclusions, and has kept statisticians from working on a large range of interesting current problems. Algorithmic modeling, both in ...

Stacked generalization

Neural Networks, Vol. 5, No. 2. (January 1992), pp. 241-259, https://doi.org/10.1016/s0893-6080(05)80023-1

Abstract

This paper introduces stacked generalization, a scheme for minimizing the generalization error rate of one or more generalizers. Stacked generalization works by deducing the biases of the generalizer(s) with respect to a provided learning set. This deduction proceeds by generalizing in a second space whose inputs are (for example) the guesses of the original generalizers when taught with part of the learning set and trying to guess the rest of it, and whose output is (for example) the correct guess. When ...

Deep Learning

(2016)
edited by M. I. T. Press
Keywords:

Abstract

The Deep Learning textbook is a resource intended to help students and practitioners enter the field of machine learning in general and deep learning in particular. The online version of the book is now complete and will remain available online for free. ...

Towards Better Exploiting Convolutional Neural Networks for Remote Sensing Scene Classification

Pattern Recognition, Vol. 61 (4 Feb 2016), pp. 539-556, https://doi.org/10.1016/j.patcog.2016.07.001
Keywords:

Abstract

We present an analysis of three possible strategies for exploiting the power of existing convolutional neural networks (ConvNets) in different scenarios from the ones they were trained: full training, fine tuning, and using ConvNets as feature extractors. In many applications, especially including remote sensing, it is not feasible to fully design and train a new ConvNet, as this usually requires a considerable amount of labeled data and demands high computational costs. Therefore, it is important to understand how to obtain the best profit from existing ConvNets. We perform ...

Deep learning in remote sensing: a review

IEEE Geoscience and Remote Sensing Magazine, Vol. 5, No. 4. (11 Oct 2017), pp. 8-36, https://doi.org/10.1109/mgrs.2017.2762307
Keywords:

Abstract

Standing at the paradigm shift towards data-intensive science, machine learning techniques are becoming increasingly important. In particular, as a major breakthrough in the field, deep learning has proven as an extremely powerful tool in many fields. Shall we embrace deep learning as the key to all? Or, should we resist a 'black-box' solution? There are controversial opinions in the remote sensing community. In this article, we analyze the challenges of using deep learning for remote sensing data analysis, review the recent advances, and provide resources to make deep ...

The lack of a priori distinctions between learning algorithms

Neural Computation, Vol. 8, No. 7. (1 October 1996), pp. 1341-1390, https://doi.org/10.1162/neco.1996.8.7.1341

Abstract

This is the first of two papers that use off-training set (OTS) error to investigate the assumption-free relationship between learning algorithms. This first paper discusses the senses in which there are no a priori distinctions between learning algorithms. (The second paper discusses the senses in which there are such distinctions.) In this first paper it is shown, loosely speaking, that for any two algorithms A and B, there are “as many” targets (or priors over targets) for which A has lower ...

What can machine learning do? Workforce implications

Science, Vol. 358, No. 6370. (22 December 2017), pp. 1530-1534, https://doi.org/10.1126/science.aap8062

Abstract

Digital computers have transformed work in almost every sector of the economy over the past several decades (1). We are now at the beginning of an even larger and more rapid transformation due to recent advances in machine learning (ML), which is capable of accelerating the pace of automation itself. However, although it is clear that ML is a “general purpose technology,” like the steam engine and electricity, which spawns a plethora of additional innovations and capabilities (2), there is no ...

Resampling methods for meta-model validation with recommendations for evolutionary computation

Evolutionary Computation, Vol. 20, No. 2. (16 February 2012), pp. 249-275, https://doi.org/10.1162/evco_a_00069

Abstract

Meta-modeling has become a crucial tool in solving expensive optimization problems. Much of the work in the past has focused on finding a good regression method to model the fitness function. Examples include classical linear regression, splines, neural networks, Kriging and support vector regression. This paper specifically draws attention to the fact that assessing model accuracy is a crucial aspect in the meta-modeling framework. Resampling strategies such as cross-validation, subsampling, bootstrapping, and nested resampling are prominent methods for model validation and ...

Combining multiple classifiers: an application using spatial and remotely sensed information for land cover type mapping

Remote Sensing of Environment, Vol. 74, No. 3. (December 2000), pp. 545-556, https://doi.org/10.1016/s0034-4257(00)00145-0

Abstract

This article discusses two new methods for increasing the accuracy of classifiers used land cover mapping. The first method, called the product rule, is a simple and general method of combining two or more classification rules as a single rule. Stacked regression methods of combining classification rules are discussed and compared to the product rule. The second method of increasing classifier accuracy is a simple nonparametric classifier that uses spatial information for classification. Two data sets used for land cover mapping ...

Bagging ensemble selection for regression

In AI 2012: Advances in Artificial Intelligence, Vol. 7691 (2012), pp. 695-706, https://doi.org/10.1007/978-3-642-35101-3_59
Keywords:

Abstract

Bagging ensemble selection (BES) is a relatively new ensemble learning strategy. The strategy can be seen as an ensemble of the ensemble selection from libraries of models (ES) strategy. Previous experimental results on binary classification problems have shown that using random trees as base classifiers, BES-OOB (the most successful variant of BES) is competitive with (and in many cases, superior to) other ensemble learning strategies, for instance, the original ES algorithm, stacking with linear regression, random forests or boosting. Motivated by ...

Bagging ensemble selection

In AI 2011: Advances in Artificial Intelligence, Vol. 7106 (2011), pp. 251-260, https://doi.org/10.1007/978-3-642-25832-9_26
Keywords:

Abstract

Ensemble selection has recently appeared as a popular ensemble learning method, not only because its implementation is fairly straightforward, but also due to its excellent predictive performance on practical problems. The method has been highlighted in winning solutions of many data mining competitions, such as the Netflix competition, the KDD Cup 2009 and 2010, the UCSD FICO contest 2010, and a number of data mining competitions on the Kaggle platform. In this paper we present a novel variant: bagging ensemble selection. ...

SoilGrids250m: Global gridded soil information based on machine learning

PLOS ONE, Vol. 12, No. 2. (16 February 2017), e0169748, https://doi.org/10.1371/journal.pone.0169748
Keywords:

Abstract

This paper describes the technical development and accuracy assessment of the most recent and improved version of the SoilGrids system at 250m resolution (June 2016 update). SoilGrids provides global predictions for standard numeric soil properties (organic carbon, bulk density, Cation Exchange Capacity (CEC), pH, soil texture fractions and coarse fragments) at seven standard depths (0, 5, 15, 30, 60, 100 and 200 cm), in addition to predictions of depth to bedrock and distribution of soil classes based on the World Reference ...

Estimating future burned areas under changing climate in the EU-Mediterranean countries

Science of The Total Environment, Vol. 450-451 (April 2013), pp. 209-222, https://doi.org/10.1016/j.scitotenv.2013.02.014
Keywords:

Abstract

The impacts of climate change on forest fires have received increased attention in recent years at both continental and local scales. It is widely recognized that weather plays a key role in extreme fire situations. It is therefore of great interest to analyze projected changes in fire danger under climate change scenarios and to assess the consequent impacts of forest fires. In this study we estimated burned areas in the European Mediterranean (EU-Med) countries under past and future climate conditions. Historical ...

Random forests

Machine Learning, Vol. 45, No. 1. (2001), pp. 5-32, https://doi.org/10.1023/a%3a1010933404324
Keywords:

Abstract

Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random ...

Predicting habitat suitability with machine learning models: The potential area of Pinus sylvestris L. in the Iberian Peninsula

Ecological Modelling, Vol. 197, No. 3-4. (August 2006), pp. 383-393, https://doi.org/10.1016/j.ecolmodel.2006.03.015
Keywords:

Abstract

We present a modelling framework for predicting forest areas. The framework is obtained by integrating a machine learning software suite within the GRASS Geographical Information System (GIS) and by providing additional methods for predictive habitat modelling. Three machine learning techniques (Tree-Based Classification, Neural Networks and Random Forest) are available in parallel for modelling from climatic and topographic variables. Model evaluation and parameter selection are measured by sensitivity-specificity ROC analysis, while the final presence and absence maps are obtained through maximisation of ...

Does the interpolation accuracy of species distribution models come at the expense of transferability?

Ecography, Vol. 35, No. 3. (March 2012), pp. 276-288, https://doi.org/10.1111/j.1600-0587.2011.06999.x

Abstract

Model transferability (extrapolative accuracy) is one important feature in species distribution models, required in several ecological and conservation biological applications. This study uses 10 modelling techniques and nationwide data on both (1) species distribution of birds, butterflies, and plants and (2) climate and land cover in Finland to investigate whether good interpolative prediction accuracy for models comes at the expense of transferability – i.e. markedly worse performance in new areas. Models’ interpolation and extrapolation performance was primarily assessed using AUC (the ...

The Neighbor Search approach applied to reservoir optimal operation: the Hoa Binh case study

No. hal-00698200. (2006)

Abstract

[Conclusion] The focus of this thesis is to show, through a real-case application, how the NS can be useful to improve the decision-making process for multi-objectives reservoir operation planning. After a survey of the principal techniques employed in literature to solve such problems, the NS algorithm has been discussed. Further, the case-study has been presented. Hoa Binh is the largest reservoir in Vietnam, providing 40% of its total power supplies and protecting the capital Hanoi from major flooding events. This double purpose generates conflicts in its management ...

References

1. Arnold, E., Tatjewski, P., Wolochowicz, P., 1994. Two methods for large-scale nonlinear optimization and their comparison on a case study of hydropower optimization. Journal of Optimization Theory Applications 81 (2), 221–248.
2. Back, T., Fogel, D. B., Michalewicz, Z. (Eds.), 1997. Handbook of Evolutionary Computation.
3. Bristol, New York: Institute of Physics Publishing and Oxford University Press.
4. Barros, M., Tsai, F., Yang, S., Yeh, W., 2003. Optimization of large-scale hydropower

Research priorities for robust and beneficial artificial intelligence

(January 2015)

Abstract

[Executive Summary] Success in the quest for artificial intelligence has the potential to bring unprecedented benefits to humanity, and it is therefore worthwhile to research how to maximize these benefits while avoiding potential pitfalls. This document gives numerous examples (which should by no means be construed as an exhaustive list) of such worthwhile research aimed at ensuring that AI remains robust and beneficial. [Research Priorities for Robust and Beneficial Artificial Intelligence: an Open Letter] Artificial intelligence (AI) research has explored a variety of problems and approaches since ...

Neural Turing machines

(10 Dec 2014)

Abstract

We extend the capabilities of neural networks by coupling them to external memory resources, which they can interact with by attentional processes. The combined system is analogous to a Turing Machine or Von Neumann architecture but is differentiable end-to-end, allowing it to be efficiently trained with gradient descent. Preliminary results demonstrate that Neural Turing Machines can infer simple algorithms such as copying, sorting, and associative recall from input and output examples. ...

Sparse Algorithms Are Not Stable: A No-Free-Lunch Theorem

Pattern Analysis and Machine Intelligence, IEEE Transactions on, Vol. 34, No. 1. (January 2012), pp. 187-193, https://doi.org/10.1109/tpami.2011.177

Abstract

We consider two desired properties of learning algorithms: *sparsity* and *algorithmic stability*. Both properties are believed to lead to good generalization ability. We show that these two properties are fundamentally at odds with each other: a sparse algorithm cannot be stable and vice versa. Thus, one has to trade off sparsity and stability in designing a learning algorithm. In particular, our general result implies that $\ell_1$-regularized regression (Lasso) cannot be stable, while $\ell_2$-regularized regression is known to have strong stability properties ...

A survey of multiple classifier systems as hybrid systems

Information Fusion, Vol. 16 (March 2014), pp. 3-17, https://doi.org/10.1016/j.inffus.2013.04.006
Keywords:

Abstract

A current focus of intense research in pattern classification is the combination of several classifier systems, which can be built following either the same or different models and/or datasets building approaches. These systems perform information fusion of classification decisions at different levels overcoming limitations of traditional approaches based on single classifiers. This paper presents an up-to-date survey on multiple classifier system (MCS) from the point of view of Hybrid Intelligent Systems. The article discusses major issues, such as diversity and decision ...

Comparing and combining physically-based and empirically-based approaches for estimating the hydrology of ungauged catchments

Journal of Hydrology, Vol. 508 (January 2014), pp. 227-239, https://doi.org/10.1016/j.jhydrol.2013.11.007

Abstract

[Highlights] [::] Methods for estimating various hydrological indices at ungauged sites were compared. [::] Methods included a TopNet rainfall-runoff model and a Random Forest empirical model. [::] TopNet estimates were improved through correction using Random Forest estimates. [::] Random Forests provided the best estimates of all indices except mean flow. [::] Mean flow was best estimated using an already published empirical method. [Summary] Predictions of hydrological regimes at ungauged sites are required for various purposes such as setting environmental flows, assessing availability of water resources or ...

Shifts in Arctic vegetation and associated feedbacks under climate change

Nature Climate Change, Vol. 3, No. 7. (31 July 2013), pp. 673-677, https://doi.org/10.1038/nclimate1858
Keywords:

Abstract

Climate warming has led to changes in the composition, density and distribution of Arctic vegetation in recent decades1, 2, 3, 4. These changes cause multiple opposing feedbacks between the biosphere and atmosphere5, 6, 7, 8, 9, the relative magnitudes of which will have globally significant consequences but are unknown at a pan-Arctic scale10. The precise nature of Arctic vegetation change under future warming will strongly influence climate feedbacks, yet Earth system modelling studies have so far assumed arbitrary increases in shrubs ...

A statistical explanation of MaxEnt for ecologists

Diversity and Distributions, Vol. 17, No. 1. (1 January 2011), pp. 43-57, https://doi.org/10.1111/j.1472-4642.2010.00725.x

Abstract

MaxEnt is a program for modelling species distributions from presence-only species records. This paper is written for ecologists and describes the MaxEnt model from a statistical perspective, making explicit links between the structure of the model, decisions required in producing a modelled distribution, and knowledge about the species and the data that might affect those decisions. To begin we discuss the characteristics of presence-only data, highlighting implications for modelling distributions. We particularly focus on the problems of sample bias and lack ...

Knowledge discovery by accuracy maximization

Proceedings of the National Academy of Sciences, Vol. 111, No. 14. (24 April 2014), pp. 201220873-5122, https://doi.org/10.1073/pnas.1220873111
Keywords:

Abstract

[Significance] We propose an innovative method to extract new knowledge from noisy and high-dimensional data. Our approach differs from previous methods in that it has an integrated procedure of validation of the results through maximization of cross-validated accuracy. In many cases, this method performs better than existing feature extraction methods and offers a general framework for analyzing any kind of complex data in a broad range of sciences. Examples ranging from genomics and metabolomics to astronomy and linguistics show the versatility ...

Mapping land cover from detailed aerial photography data using textural and neural network analysis

International Journal of Remote Sensing, Vol. 28, No. 7. (1 April 2007), pp. 1625-1642, https://doi.org/10.1080/01431160600887722
Keywords:

Abstract

Automated mapping of land cover using black and white aerial photographs, as an alternative method to traditional photo?interpretation, requires using methods other than spectral analysis classification. To this end, textural measurements have been shown to be useful indicators of land cover. In this work, a neural network model is proposed and tested to map historical land use/land cover (LUC) from very detailed panchromatic aerial photographs (5 m resolution) using textural measurements. The method is used to identify different land use and management ...

Vulnerability of Pinus cembra L. in the Alps and the Carpathian mountains under present and future climates

Forest Ecology and Management, Vol. 259, No. 4. (05 February 2010), pp. 750-761, https://doi.org/10.1016/j.foreco.2009.10.001

Abstract

Proactive management should be applied within a forest conservation context to prevent extinction or degradation of those forest ecosystems that we suspect will be affected by global warming in the next century. The aim of this study is to estimate the vulnerability under climate change of a localized and endemic tree species Pinus cembra that occurs in the alpine timberline. We used the Random Forest ensemble classifier and available bioclimatic and ecological data to model present and future suitable areas for ...

Novel methods improve prediction of species' distributions from occurrence data

Ecography, Vol. 29, No. 2. (1 April 2006), pp. 129-151, https://doi.org/10.1111/j.2006.0906-7590.04596.x

Abstract

Prediction of species’ distributions is central to diverse applications in ecology, evolution and conservation science. There is increasing electronic access to vast sets of occurrence records in museums and herbaria, yet little effective guidance on how best to use this information in the context of numerous approaches for modelling distributions. To meet this need, we compared 16 modelling methods over 226 species from 6 regions of the world, creating the most comprehensive set of model comparisons to date. We used presence-only ...

The Need for Open Source Software in Machine Learning

J. Mach. Learn. Res., Vol. 8 (December 2007), pp. 2443-2466
Keywords:

Abstract

Open source tools have recently reached a level of maturity which makes them suitable for building large-scale real-world systems. At the same time, the field of machine learning has developed a large body of powerful learning algorithms for diverse applications. However, the true potential of these methods is not used, since existing implementations are not openly shared, resulting in software with low usability, and weak interoperability. We argue that this situation can be significantly improved by increasing incentives for researchers to ...

Batch mode reinforcement learning based on the synthesis of artificial trajectories

In Annals of Operations Research, Vol. 208, No. 1. (2013), pp. 383-416, https://doi.org/10.1007/s10479-012-1248-5
Keywords:

Abstract

In this paper, we consider the batch mode reinforcement learning setting, where the central problem is to learn from a sample of trajectories a policy that satisfies or optimizes a performance criterion. We focus on the continuous state space case for which usual resolution schemes rely on function approximators either to represent the underlying control problem or to represent its value function. As an alternative to the use of function approximators, we rely on the synthesis of “artificial trajectories” from the ...

Waffles: A Machine Learning Toolkit

Journal of Machine Learning Research, Vol. 12 ( 2011), pp. 2383-2387
Keywords:

Abstract

We present a breadth-oriented collection of cross-platform command-line tools for researchers in machine learning called Waffles. The Waffles tools are designed to offer a broad spectrum of functionality in a manner that is friendly for scripted automation. All functionality is also available in a C++ class library. Waffles is available under the GNU Lesser General Public License. ...

A few useful things to know about machine learning

Communications of the ACM, Vol. 55, No. 10. (01 October 2012), pp. 78-87, https://doi.org/10.1145/2347736.2347755
Keywords:

Abstract

Machine learning algorithms can figure out how to perform important tasks by generalizing from examples. This is often feasible and cost-effective where manual programming is not. As more data becomes available, more ambitious problems can be tackled. As a result, machine learning is widely used in computer science and other fields. However, developing successful machine learning applications requires a substantial amount of “black art” that is hard to find in textbooks. This article summarizes twelve key lessons that machine learning researchers ...

The rise and fall of supervised machine learning techniques

Bioinformatics, Vol. 27, No. 24. (15 December 2011), pp. 3331-3332, https://doi.org/10.1093/bioinformatics/btr585
Keywords:

Abstract

Machine learning is of immense importance in bioinformatics and biomedical science more generally (Larrañaga et al., 2006; Tarca et al., 2007). In particular, supervised machine learning has been used to great effect in numerous bioinformatics prediction methods. Through many years of editing and reviewing manuscripts, we noticed that some supervised machine learning techniques seem to be gaining in popularity while others seemed, at least to our eyes, to be looking ‘unfashionable’. ...

Q-learning

Machine Learning In Machine Learning, Vol. 8, No. 3-4. (1 May 1992), pp. 279-292, https://doi.org/10.1007/bf00992698
Keywords:

Abstract

Q-learning (Watkins, 1989) is a simple way for agents to learn how to act optimally in controlled Markovian domains. It amounts to an incremental method for dynamic programming which imposes limited computational demands. It works by successively improving its evaluations of the quality of particular actions at particular states. This paper presents and proves in detail a convergence theorem for Q-learning based on that outlined in Watkins (1989). We show that Q-learning converges to the optimum action-values with probability 1 so long ...

Dynamic programming applications in water resources

Water Resources Research, Vol. 18, No. 4. (1982), null, https://doi.org/10.1029/wr018i004p00673

Abstract

The central intention of this survey is to review dynamic programming models for water resource problems and to examine computational techniques which have been used to obtain solutions to these problems. Problem areas surveyed here include aqueduct design, ...