97 Things Every Data Engineer Should Know: Collective Wisdom from the Experts
- 5h 41m 11s
- Tobias Macey
- Gildan Media
- 2021
Take advantage of today's sky-high demand for data engineers. With this in-depth book, current and aspiring engineers will learn powerful real-world best practices for managing data big and small. Contributors from notable companies including Twitter, Google, Stitch Fix, Microsoft, Capital One, and LinkedIn share their experiences and lessons learned for overcoming a variety of specific and often nagging challenges.
Edited by Tobias Macey, host of the popular Data Engineering Podcast, this book presents ninety-seven concise and useful tips for cleaning, prepping, wrangling, storing, processing, and ingesting data. Data engineers, data architects, data team managers, data scientists, machine learning engineers, and software engineers will greatly benefit from the wisdom and experience of their peers. Topics include:
- The Importance of Data Lineage-Julien Le Dem
- Data Security for Data Engineers-Katharine Jarmul
- The Two Types of Data Engineering and Data Engineers-Jesse Anderson
- Six Dimensions for Picking an Analytical Data Warehouse-Gleb Mezhanskiy
- The End of ETL as We Know It-Paul Singman
About the Author
Tobias Macey hosts the Data Engineering Podcast and Podcast.\_\_init\_\_ where he discusses the tools, topics, and people that comprise the data engineering and Python communities respectively. His experience across the domains of infrastructure, software, cloud, and data engineering allows him to ask informed questions and bring useful context to the discussions. The ongoing focus of his career is to help educate people, through designing and building platforms that power online learning, consulting with companies and investors to understand the possibilities of emerging technologies, and leading teams of engineers to help them grow professionally.
In this Audiobook
-
Chapter 1 - A (Book) Case for Eventual Consistency
-
Chapter 2 - A/B and How to Be
-
Chapter 3 - About the Storage Layer
-
Chapter 4 - Analytics as the Secret Glue for Microservice Architectures
-
Chapter 5 - Automate Your Infrastructure
-
Chapter 6 - Automate Your Pipeline Tests
-
Chapter 7 - Be Intentional About the Batching Model in Your Data Pipelines
-
Chapter 8 - Beware of Silver-Bullet Syndrome
-
Chapter 9 - Building a Career as a Data Engineer
-
Chapter 10 - Business Dashboards for Data Pipelines
-
Chapter 11 - Caution: Data Science Projects Can Turn into the Emperor's New Clothes
-
Chapter 12 - Change Data Capture
-
Chapter 13 - Column Names as Contracts
-
Chapter 14 - Consensual, Privacy-Aware Data Collection
-
Chapter 15 - Cultivate Good Working Relationships with Data Consumers
-
Chapter 16 - Data Engineering != Spark
-
Chapter 17 - Data Engineering for Autonomy and Rapid Innovation
-
Chapter 18 - Data Engineering from a Data Scientist’s Perspective
-
Chapter 19 - Data Pipeline Design Patterns for Reusability and Extensibility
-
Chapter 20 - Data Quality for Data Engineers
-
Chapter 21 - Data Security for Data Engineers
-
Chapter 22 - Data Validation Is More Than Summary Statistics
-
Chapter 23 - Data Warehouses Are the Past, Present, and Future
-
Chapter 24 - Defining and Managing Messages in Log-Centric Architectures
-
Chapter 25 - Demystify the Source and Illuminate the Data Pipeline
-
Chapter 26 - Develop Communities, Not Just Code
-
Chapter 27 - Effective Data Engineering in the Cloud World
-
Chapter 28 - Embrace the Data Lake Architecture
-
Chapter 29 - Embracing Data Silos
-
Chapter 30 - Engineering Reproducible Data Science Projects
-
Chapter 31 - Five Best Practices for Stable Data Processing
-
Chapter 32 - Focus on Maintainability and Break Up Those ETL Tasks
-
Chapter 33 - Friends Don’t Let Friends Do Dual-Writes
-
Chapter 34 - Fundamental Knowledge
-
Chapter 35 - Getting the “Structured” Back into SQL
-
Chapter 36 - Give Data Products a Frontend with Latent Documentation
-
Chapter 37 - How Data Pipelines Evolve
-
Chapter 38 - How to Build Your Data Platform like a Product
-
Chapter 39 - How to Prevent a Data Mutiny
-
Chapter 40 - Know the Value per Byte of Your Data
-
Chapter 41 - Know Your Latencies
-
Chapter 42 - Learn to Use a NoSQL Database, but Not like an RDBMS
-
Chapter 43 - Let the Robots Enforce the Rules
-
Chapter 44 - Listen to Your Users-but Not Too Much
-
Chapter 45 - Low-Cost Sensors and the Quality of Data
-
Chapter 46 - Maintain Your Mechanical Sympathy
-
Chapter 47 - Metadata ≥ Data
-
Chapter 48 - Metadata Services as a Core Component of the Data Platform
-
Chapter 49 - Mind the Gap: Your Data Lake Provides No ACID Guarantees
-
Chapter 50 - Modern Metadata for the Modern Data Stack
-
Chapter 51 - Most Data Problems Are Not Big Data Problems
-
Chapter 52 - Moving from Software Engineering to Data Engineering
-
Chapter 53 - Observability for Data Engineers
-
Chapter 54 - Perfect Is the Enemy of Good
-
Chapter 55 - Pipe Dreams
-
Chapter 56 - Preventing the Data Lake Abyss
-
Chapter 57 - Prioritizing User Experience in Messaging Systems
-
Chapter 58 - Privacy Is Your Problem
-
Chapter 59 - QA and All Its Sexiness
-
Chapter 60 - Seven Things Data Engineers Need to Watch Out for in ML Projects
-
Chapter 61 - Six Dimensions for Picking an Analytical Data Warehouse
-
Chapter 62 - Small Files in a Big Data World
-
Chapter 63 - Streaming Is Different from Batch
-
Chapter 64 - Tardy Data
-
Chapter 65 - Tech Should Take a Back Seat for Data Project Success
-
Chapter 66 - Ten Must-Ask Questions for Data-Engineering Projects
-
Chapter 67 - The Data Pipeline Is Not About Speed
-
Chapter 68 - The Dos and Don’ts of Data Engineering
-
Chapter 69 - The End of ETL as We Know It
-
Chapter 70 - The Haiku Approach to Writing Software
-
Chapter 71 - The Hidden Cost of Data Input/Output
-
Chapter 72 - The Holy War Between Proprietary and Open Source Is a Lie
-
Chapter 73 - The Implications of the CAP Theorem
-
Chapter 74 - The Importance of Data Lineage
-
Chapter 75 - The Many Meanings of Missingness
-
Chapter 76 - The Six Words That Will Destroy Your Career
-
Chapter 77 - The Three Invaluable Benefits of Open Source for Testing Data Quality
-
Chapter 78 - The Three Rs of Data Engineering
-
Chapter 79 - The Two Types of Data Engineering and Data Engineers
-
Chapter 80 - The Yin and Yang of Big Data Scalability
-
Chapter 81 - Threading and Concurrency in Data Processing
-
Chapter 82 - Three Important Distributed Programming Concepts
-
Chapter 83 - Time (Semantics) Won’t Wait
-
Chapter 84 - Tools Don’t Matter, Patterns and Practices Do
-
Chapter 85 - Total Opportunity Cost of Ownership
-
Chapter 86 - Understanding the Ways Different Data Domains Solve Problems
-
Chapter 87 - What Is a Data Engineer? Clue: We’re Data Science Enablers
-
Chapter 88 - What Is a Data Mesh, and How Not to Mesh It Up
-
Chapter 89 - What Is Big Data?
-
Chapter 90 - What to Do When You Don’t Get Any Credit
-
Chapter 91 - When Our Data Science Team Didn’t Produce Value
-
Chapter 92 - When to Avoid the Naive Approach
-
Chapter 93 - When to Be Cautious About Sharing Data
-
Chapter 94 - When to Talk and When to Listen
-
Chapter 95 - Why Data Science Teams Need Generalists - Not Specialists
-
Chapter 96 - With Great Data Comes Great Responsibility
-
Chapter 97 - Your Data Tests Failed! Now What?