Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications

12h 23m
Gary Miner, et al.
Elsevier Science and Technology Books, Inc.
2012

The world contains an unimaginably vast amount of digital information which is getting ever vaster ever more rapidly. This makes it possible to do many things that previously could not be done: spot business trends, prevent diseases, combat crime and so on. Managed well, the textual data can be used to unlock new sources of economic value, provide fresh insights into science and hold governments to account. As the Internet expands and our natural capacity to process the unstructured text that it contains diminishes, the value of text mining for information retrieval and search will increase dramatically.

This comprehensive professional reference brings together all the information, tools and methods a professional will need to efficiently use text mining applications and statistical analysis.

Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications presents a comprehensive how- to reference that shows the user how to conduct text mining and statistically analyze results. In addition to providing an in-depth examination of core text mining and link detection tools, methods and operations, the book examines advanced preprocessing techniques, knowledge representation considerations, and visualization approaches. Finally, the book explores current real-world, mission-critical applications of text mining and link detection using real world example tutorials in such varied fields as corporate, finance, business intelligence, genomics research, and counterterrorism activities.

Extensive case studies, most in a tutorial format, allow the reader to 'click through' the example using a software program, thus learning to conduct text mining analyses in the most rapid manner of learning possible
Numerous examples, tutorials, power points and datasets available via companion website on Elsevierdirect.com
Glossary of text mining terms provided in the appendix

About the Authors

Dr. Gary Miner received a B.S. from Hamline University, St. Paul, Minnesota, with Biology, Chemistry, and Education majors; an M.S. in Zoology and Population Genetics from the University of Wyoming; and a Ph.D. in biochemical genetics from the University of Kansas as the recipient of a NASA predoctoral fellowship. During the doctoral study years, he also studied mammalian genetics at the Jackson Laboratory, Bar Harbor, Maine, under a College Training Program on an NIH award; another College Training Program at the Bermuda Biological Station, St. George's West, Bermuda, in a Marine Developmental Embryology course, on an NSF award; and a third College Training Program held at the University of California, San Diego, at the Molecular Techniques in Developmental Biology Institute, again on an NSF award. Following that he studied as a postdoctoral student at the University of Minnesota in behavioral genetics, where, along with research in schizophrenia and Alzheimer's disease, he learned what was involved in writing books from assisting in editing two book manuscripts of his mentor Irving Gottesman, Ph.D.

Dr. John Elder heads the United States' leading data mining consulting team, with offices in Charlottesville, Virginia; Washington, D.C.; Baltimore, Maryland; and Manhasset, New York (www.datamininglab.com) Founded in 1995, Elder Research, Inc. focuses on investment, commercial, and security applications of advanced analytics, including text mining, image recognition, process optimization, cross-selling, biometrics, drug efficacy, credit scoring, market sector timing, and fraud detection. John obtained a B.S. and an M.E.E. in electrical engineering from Rice University and a Ph.D. in systems engineering from the University of Virginia, where he's an adjunct professor teaching Optimization or Data Mining. Prior to 16 years at ERI, he spent five years in aerospace defense consulting, four years heading research at an investment management firm, and two years in Rice's Computational & Applied Mathematics Department.

Thomas Hill received his Vordiplom in psychology from Kiel University in Germany and earned an M.S. in industrial psychology and a Ph.D. in psychology and quantitative methods from the University of Kansas. He was associate professor (and then research professor) at the University of Tulsa from 1984 to 2009, where he taught data analysis and data mining courses. He also has been vice president for Research and Development and then Analytic Solutions at StatSoft Inc., where he has been involved for over 20 years in the development of data analysis, data and text mining algorithms, and the delivery of analytic solutions. Dr. Hill has received numerous academic grants and awards from the National Science Foundation, the National Institute of Health, the Center for Innovation Management, the Electric Power Research Institute, and other institutions. He has completed diverse consulting projects with companies from practically all industries and has worked with the leading financial services, insurance, manufacturing, pharmaceutical, retailing, and other companies in the United States and internationally on identifying and refining effective data mining and predictive modeling solutions for diverse applications.

Dr. Nisbet was trained initially in ecosystems analysis. He has over 30 years of experience in complex systems analysis and modeling as a researcher (University of California, Santa Barbara). He entered business in 1994 to lead the team that developed the first data mining models of customer response for AT&T and NCR Corporation. While at NCR Corporation and Torrent Systems, he pioneered the design and development of configurable data mining applications for retail sales forecasting and Churn, Propensity-to-buy, and Customer Acquisition in Telecommunications and Insurance. In addition to data mining, he has expertise in data warehousing technology for Extract, Transform, and Load (ETL) operations; business intelligence reporting; and data quality analyses. He is lead author of the Handbook of Statistical Analysis & Data Mining Applications (Academic Press, 2009). Currently, he functions as a data scientist and independent data mining consultant.

Dr. Dursun Delen is the William S. Spears Chair in Business Administration and Associate Professor of Management Science and Information Systems in the Spears School of Business at Oklahoma State University (OSU). He received his Ph.D. in industrial engineering and management from OSU in 1997. Prior to his appointment as an assistant professor at OSU in 2001, he worked for a privately owned research and consultancy company, Knowledge Based Systems Inc., in College Station, Texas, as a research scientist for five years, during which he led a number of decision support and other information systems-related research projects funded by federal agencies, including DoD, NASA, NIST and DOE.

Dr. Andrew Fast leads research in text mining and social network analysis at Elder Research. Dr. Fast graduated magna cum laude from Bethel University and earned an M.S. and a Ph.D. in computer science from the University of Massachusetts Amherst. There, his research focused on causal data mining and mining complex relational data such as social networks. At ERI, Andrew leads the development of new tools and algorithms for data and text mining for applications of capabilities assessment, fraud detection, and national security. Dr. Fast has published on an array of applications, including detecting securities fraud using the social network among brokers and understanding the structure of criminal and violent groups. Other publications cover modeling peer-to-peer music file sharing networks, understanding how collective classification works, and predicting playoff success of NFL head coaches (work featured on ESPN.com).

In this Book

Foreword 1
Foreword 2
Foreword 3
Introduction
List of Tutorials by Guest Authors
The History of Text Mining
The Seven Practice Areas of Text Analytics
Conceptual Foundations of Text Mining and Preprocessing Steps
Applications and Use Cases for Text Mining
Text Mining Methodology
Three Common Text Mining Software Tools
Case Study—Using the Social Share of Voice to Predict Events That are about to Happen
Mining Twitter for Airline Consumer Sentiment
Using STATISTICA Text Miner to Monitor and Predict Success of Marketing Campaigns Based on Social Media Data
Text Mining Improves Model Performance in Predicting Airplane Flight Accident Outcome
Insurance Industry—Text Analytics Adds “Lift” to Predictive Models with STATISTICA Text and Data Miner
Analysis of Survey Data for Establishing the “Best Medical Survey Instrument” Using Text Mining
Analysis of Survey Data for Establishing “Best Medical Survey Instrument” Using Text Mining—Central Asian (Russian Language) Study Tutorial 2—Potential for Constructing Instruments That Have Increased Validity
Using eBay Text for Predicting ATLAS Instrumental Learning
Text Mining for Patterns in Children's Sleep Disorders Using STATISTICA Text Miner
Extracting Knowledge from Published Literature Using RapidMiner
Text Mining Speech Samples—Can the Speech of Individuals Diagnosed with Schizophrenia Differentiate Them from Unaffected Controls?
Text Mining Using STM™, CART®, and TreeNet® from Salford Systems—Analysis of 16,000 iPod Auctions on eBay
Predicting Micro Lending Loan Defaults Using SAS® Text Miner
Opera Lyrics—Text Analytics Compared by the Composer and the Century of Composition—Wagner versus Puccini
Case Study—Sentiment-Based Text Analytics to Better Predict Customer Satisfaction and Net Promoter® Score Using IBM®SPSS® Modeler
Case Study—Detecting Deception in Text with Freely Available Text and Data Mining Tools
Predicting Box Office Success of Motion Pictures with Text Mining
A Hands-On Tutorial of Text Mining in PASW—Clustering and Sentiment Analysis Using Tweets from Twitter
A Hands-On Tutorial on Text Mining in SAS®—Analysis of Customer Comments for Clustering and Predictive Modeling
Scoring Retention and Success of Incoming College Freshmen Using Text Analytics
Searching for Relationships in Product Recall Data from the Consumer Product Safety Commission with STATISTICA Text Miner
Potential Problems That Can Arise in Text Mining—Example Using NALL Aviation Data
Exploring the Unabomber Manifesto Using Text Miner
Text Mining PubMed—Extracting Publications on Genes and Genetic Markers Associated with Migraine Headaches from PubMed Abstracts
Case Study—The Problem with the Use of Medical Abbreviations by Physicians and Health Care Providers
Classifying Documents with Respect to “Earnings” and Then Making a Predictive Model for the Target Variable Using Decision Trees, MARSplines, Naïve Bayes Classifier, and K-Nearest Neighbors with STATISTICA Text Miner
Case Study—Predicting Exposure of Social Messages—The Bin Laden Live Tweeter
The InFLUence Model—Web Crawling, Text Mining, and Predictive Analysis with 2010–2011 Influenza Guidelines—CDC, IDSA, WHO, and FMC
Text Classification and Categorization
Prediction in Text Mining—The Data Mining Algorithms of Predictive Analytics
Entity Extraction
Feature Selection and Dimensionality Reduction
Singular Value Decomposition in Text Mining
Web Analytics and Web Mining
Clustering Words and Documents
Leveraging Text Mining in Property and Casualty Insurance
Focused Web Crawling
The Future of Text and Web Analytics
Summary
Glossary
How to Use the Data Sets and the Text Mining Software on the DVD or on Links for Practical Text Mining

FREE ACCESS

Course Text Mining and Analytics: Pattern Matching & Information Extraction

(17)

Book Learn Data Mining Through Excel: A Step-by-Step Approach for Understanding Machine Learning Methods, 2nd Edition

Book Web Data Mining with Python: Discover and Extract Information from the Web Using Python

Get Started

Sharpen your skills. Upgrade your career. Find the right learning path for you, based on your role and skills. Take part in hands-on practice, study for a certification, and much more - all personalized for you.

*Not included: Compliance, Leadership Development Program content, and Engineering books

Your content + our content + our platform = a path to learning success

Using our learning experience platform, Percipio, your learners can engage in custom learning paths that can feature curated content from all sources.

Learn More

Aspire to something bigger

Aspire Journeys are guided learning paths that set you in motion for career success.

Browse Aspire Journeys

Explore a world of live learning with Global Knowledge

Choose from convenient delivery formats to get the training you and your team need - where, when and how you want it.

Browse Live Learning

IT Skills and Salary Report

ESG Impact Report

Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications

In this Book

YOU MIGHT ALSO LIKE