Principles of Data Mining
- 10h 48m
- David Hand, Heikki Mannila, Padhraic Smyth
- The MIT Press
- 2001
The growing interest in data mining is motivated by a common problem across disciplines: how does one store, access, model, and ultimately describe and understand very large data sets? Historically, different aspects of data mining have been addressed independently by different disciplines. This is the first truly interdisciplinary text on data mining, blending the contributions of information science, computer science, and statistics.
The book consists of three sections. The first, foundations, provides a tutorial overview of the principles underlying data mining algorithms and their application. The presentation emphasizes intuition rather than rigor. The second section, data mining algorithms, shows how algorithms are constructed to solve specific problems in a principled manner. The algorithms covered include trees and rules for classification and regression, association rules, belief networks, classical statistics models, nonlinear models such as neural networks, and local "memory-based" models. The third section shows how all of the preceeding analysis fits together when applied to real-world data mining problems. Topics include the role of metadata, how to handle missing data, and data preprocessing.
About the Authors
David Hand is Professor of Statistics, Department of Mathematics, Imperial College, London.
Heikki Mannila is Research Fellow at Nokia Research Center and Professor, Department of Computer Science and Engineering, Helsinki University of Technology.
Padhraic Smyth is Associate Professor, Department of Information and Computer Science, the University of California, Irvine.
In this Book
-
Series Foreword
-
Introduction
-
Measurement and Data
-
Visualizing and Exploring Data
-
Data Analysis and Uncertainty
-
A Systematic Overview of Data Mining Algorithms
-
Models and Patterns
-
Score Functions for Data Mining Algorithms
-
Search and Optimization Methods
-
Descriptive Modeling
-
Predictive Modeling for Classification
-
Predictive Modeling for Regression
-
Data Organization and Databases
-
Finding Patterns and Rules
-
Retrieval by Content