Statistical and Machine-Learning Data Mining: Techniques for Better Predictive Modeling and Analysis of Big Data, Second Edition

  • 7h 54m
  • Bruce Ratner
  • CRC Press
  • 2012

The second edition of a bestseller, Statistical and Machine-Learning Data Mining: Techniques for Better Predictive Modeling and Analysis of Big Data is still the only book, to date, to distinguish between statistical data mining and machine-learning data mining. The first edition, titled Statistical Modeling and Analysis for Database Marketing: Effective Techniques for Mining Big Data, contained 17 chapters of innovative and practical statistical data mining techniques. In this second edition, renamed to reflect the increased coverage of machine-learning data mining techniques, the author has completely revised, reorganized, and repositioned the original chapters and produced 14 new chapters of creative and useful machine-learning data mining techniques. In sum, the 31 chapters of simple yet insightful quantitative techniques make this book unique in the field of data mining literature.

The statistical data mining methods effectively consider big data for identifying structures (variables) with the appropriate predictive power in order to yield reliable and robust large-scale statistical models and analyses. In contrast, the author's own GenIQ Model provides machine-learning solutions to common and virtually unapproachable statistical problems. GenIQ makes this possible — its utilitarian data mining features start where statistical data mining stops.

This book contains essays offering detailed background, discussion, and illustration of specific methods for solving the most commonly experienced problems in predictive modeling and analysis of big data. They address each methodology and assign its application to a specific type of problem. To better ground readers, the book provides an in-depth discussion of the basic methodologies of predictive modeling and analysis. While this type of overview has been attempted before, this approach offers a truly nitty-gritty, step-by-step method that both tyros and experts in the field can enjoy playing with.

About the Author

Bruce Ratner, PhD, The Significant Statistician, is president and founder of DM STAT-1 Consulting, the ensample for statistical modeling, analysis and data mining, and machine-learning data mining in the DM Space. DM STAT-1 specializes in all standard statistical techniques and methods using machine-learning/statistics algorithms, such as its patented GenIQ Model, to achieve its clients' goals, across industries including direct and database marketing, banking, insurance, finance, retail, telecommunications, health care, pharmaceutical, publication and circulation, mass and direct advertising, catalog marketing, e-commerce, Web mining, B2B (business to business), human capital management, risk management, and nonprofit fund-raising.

Bruce's par excellence consulting expertise is apparent, as he is the author of the best-selling book Statistical Modeling and Analysis for Database Marketing: Effective Techniques for Mining Big Data. Bruce ensures his clients' marketing decision problems are solved with the optimal problem solution methodology and rapid startup and timely delivery of project results. Client projects are executed with the highest level of statistical practice. He is an often-invited speaker at public industry events, such as the SAS Data Mining Conference, and private seminars at the request of Fortune magazine's top 100 companies.

Bruce has his footprint in the predictive analytics community as a frequent speaker at industry conferences and as the instructor of the advanced statistics course sponsored by the Direct Marketing Association for over a decade. He is the author of over 100 peer-reviewed articles on statistical and machine-learning procedures and software tools. He is a coauthor of the popular textbook the New Direct Marketing and is on the editorial board of the Journal of Database Marketing.

Bruce is also active in the online data mining industry. He is a frequent contributor to KDNuggets Publications, the top resource of the data mining community. His articles on statistical and machine-learning methodologies draw a huge monthly following. Another online venue in which he participates is the professional network LinkedIN. His seminal articles posted on LinkedIN, covering statistical and machine-learning procedures for big data, have sparked countless rich discussions. In addition, he is the author of his own DM STAT-1 Newsletter on the Web.

Bruce holds a doctorate in mathematics and statistics, with a concentration in multivariate statistics and response model simulation. His research interests include developing hybrid modeling techniques, which combine traditional statistics and machine-learning methods. He holds a patent for a unique application in solving the two-group classification problem with genetic programming.

In this Book

  • Introduction
  • Two Basic Data Mining Methods for Variable Assessment
  • CHAID-Based Data Mining for Paired-Variable Assessment
  • The Importance of Straight Data—Simplicity and Desirability for Good Model-Building Practice
  • Symmetrizing Ranked Data—A Statistical Data Mining Method for Improving the Predictive Power of Data
  • Principal Component Analysis—A Statistical Data Mining Method for Many-Variable Assessment
  • The Correlation Coefficient—Its Values Range between Plus/Minus 1, or Do They?
  • Logistic Regression—The Workhorse of Response Modeling
  • Ordinary Regression—The Workhorse of Profit Modeling
  • Variable Selection Methods in Regression—Ignorable Problem, Notable Solution
  • CHAID for Interpreting a Logistic Regression Model
  • The Importance of the Regression Coefficient
  • The Average Correlation—A Statistical Data Mining Measure for Assessment of Competing Predictive Models and the Importance of the Predictor Variables
  • CHAID for Specifying a Model with Interaction Variables
  • Market Segmentation Classification Modeling with Logistic Regression
  • CHAID as a Method for Filling in Missing Values
  • Identifying Your Best Customers—Descriptive, Predictive, and Look-Alike Profiling
  • Assessment of Marketing Models
  • Bootstrapping in Marketing—A New Approach for Validating Models
  • Validating the Logistic Regression Model—Try Bootstrapping
  • Visualization of Marketing Models[*] Data Mining to Uncover Innards of a Model
  • The Predictive Contribution Coefficient—A Measure of Predictive Importance
  • Regression Modeling Involves Art, Science, and Poetry, Too
  • Genetic and Statistic Regression Models—A Comparison
  • Data Reuse—A Powerful Data Mining Effect of the GenIQ Model
  • A Data Mining Method for Moderating Outliers Instead of Discarding Them
  • Overfitting—Old Problem, New Solution
  • The Importance of Straight Data—Revisited
  • The GenIQ Model—Its Definition and an Application
  • Finding the Best Variables for Marketing Models
  • Interpretation of Coefficient-Free Models
SHOW MORE
FREE ACCESS