Data Quality for Analytics Using SAS
- 6h 41m
- Gerhard Svolba
- SAS Institute
- 2012
Analytics offers many capabilities and options to measure and improve data quality, and SAS is perfectly suited to these tasks. Gerhard Svolba's Data Quality for Analytics Using SAS focuses on selecting the right data sources and ensuring data quantity, relevancy, and completeness. The book is made up of three parts. The first part, which is conceptual, defines data quality and contains text, definitions, explanations, and examples. The second part shows how the data quality status can be profiled and the ways that data quality can be improved with analytical methods. The final part details the consequences of poor data quality for predictive modeling and time series forecasting.
With this book you will learn how you can use SAS to perform advanced profiling of data quality status and how SAS can help improve your data quality.
In this Book
-
Introductory Case Studies
-
Definition and Scope of Data Quality for Analytics
-
Data Availability
-
Data Quantity
-
Data Completeness
-
Data Correctness
-
Predictive Modeling
-
Analytics for Data Quality
-
Process Considerations for Data Quality
-
Profiling and Imputation of Missing Values
-
Profiling and Replacement of Missing Data in a Time Series
-
Data Quality Control Across Related Tables
-
Data Quality with Analytics
-
Data Quality Profiling and Improvement with SAS Analytic Tools
-
Introduction to Simulation Studies
-
Simulating the Consequences of Poor Data Quality for Predictive Modeling
-
Influence of Data Quantity and Data Availability on Model Quality in Predictive Modeling
-
Influence of Data Completeness on Model Quality in Predictive Modeling
-
Influence of Data Correctness on Model Quality in Predictive Modeling
-
Simulating the Consequences of Poor Data Quality in Time Series Forecasting
-
Consequences of Data Quantity and Data Completeness in Time Series Forecasting
-
Consequences of Random Disturbances in Time Series Data
-
Consequences of Systematic Disturbances in Time Series Data