Thinking Data Science: A Data Science Practitioner's Guide
- 3h 44m
- Poornachandra Sarang
- Springer
- 2023
This definitive guide to Machine Learning projects answers the problems an aspiring or experienced data scientist frequently has: Confused on what technology to use for your ML development? Should I use GOFAI, ANN/DNN or Transfer Learning? Can I rely on AutoML for model development? What if the client provides me Gig and Terabytes of data for developing analytic models? How do I handle high-frequency dynamic datasets? This book provides the practitioner with a consolidation of the entire data science process in a single “Cheat Sheet”.
The challenge for a data scientist is to extract meaningful information from huge datasets that will help to create better strategies for businesses. Many Machine Learning algorithms and Neural Networks are designed to do analytics on such datasets. For a data scientist, it is a daunting decision as to which algorithm to use for a given dataset. Although there is no single answer to this question, a systematic approach to problem solving is necessary. This book describes the various ML algorithms conceptually and defines/discusses a process in the selection of ML/DL models. The consolidation of available algorithms and techniques for designing efficient ML models is the key aspect of this book. Thinking Data Science will help practising data scientists, academicians, researchers, and students who want to build ML models using the appropriate algorithms and architectures, whether the data be small or big.
About the Author
Poornachandra Sarang, in his IT career spanning four decades, has been consulting large IT organizations on the design and architecture of systems using state-of-the-art technologies. He has authored several books covering a wide range of emerging technologies. Dr. Sarang is a Ph.D. advisor for Computer Science and Engineering and is on the thesis advisory committee for aspiring doctoral candidates. He has designed and delivered courses/curricula for universities at the postgraduate level, including courses and workshops on emerging technologies for industry. He is a known face at technical and research conferences delivering both keynote and technical talks.
In this Book
-
Data Science Process
-
Dimensionality Reduction: Creating Manageable Training Datasets
-
Regression Analysis: A Well-Studied Statistical Technique for Predictive Analysis
-
Decision Tree: A Supervised Learning Algorithm for Classification
-
Ensemble: Bagging and Boosting: Improving Decision Tree Performance by Ensemble Methods
-
K-Nearest Neighbors: A Supervised Learning Algorithm for Classification and May Be Regression
-
Naive Bayes: A Supervised Learning Algorithm for Classification
-
Support Vector Machines: A Supervised Learning Algorithm for Classification and Regression
-
Centroid-Based Clustering: Clustering Algorithms for Hard Clustering
-
Connectivity-Based Clustering: Clustering Built on a Tree-Type Structure
-
Gaussian Mixture Model: A Probabilistic Clustering Model for Datasets with Mixture of Gaussian Blobs
-
Density-Based Clustering: Density-Based Spatial Clustering
-
BIRCH: Divide and Conquer
-
CLARANS: Clustering Large Datasets with Randomized Search
-
Affinity Propagation Clustering: A Gossip-Style Algorithm for Clustering
-
STING & CLIQUE: Density and Grid Based Clustering
-
Artificial Neural Networks: A Noticeable Evolution in AI
-
ANN-Based Applications: Text and Image Dataset Processing for ANN Applications
-
Automated Tools: Data Scientist’s Aid for Designing Classical and ANN-Based Models
-
Data Scientist’s Ultimate Workflow: A Quick Summary on a Data Scientist’s Approach to Model Development