Beginning Apache Spark Using Azure Databricks: Unleashing Large Cluster Analytics in the Cloud

4h 47m
Robert Ilijason
Apress
2020

Analyze vast amounts of data in record time using Apache Spark with Databricks in the Cloud. Learn the fundamentals, and more, of running analytics on large clusters in Azure and AWS, using Apache Spark with Databricks on top. Discover how to squeeze the most value out of your data at a mere fraction of what classical analytics solutions cost, while at the same time getting the results you need, incrementally faster.

This book explains how the confluence of these pivotal technologies gives you enormous power, and cheaply, when it comes to huge datasets. You will begin by learning how cloud infrastructure makes it possible to scale your code to large amounts of processing units, without having to pay for the machinery in advance. From there you will learn how Apache Spark, an open source framework, can enable all those CPUs for data analytics use. Finally, you will see how services such as Databricks provide the power of Apache Spark, without you having to know anything about configuring hardware or software. By removing the need for expensive experts and hardware, your resources can instead be allocated to actually finding business value in the data.

This book guides you through some advanced topics such as analytics in the cloud, data lakes, data ingestion, architecture, machine learning, and tools, including Apache Spark, Apache Hadoop, Apache Hive, Python, and SQL. Valuable exercises help reinforce what you have learned.

What You Will Learn

Discover the value of big data analytics that leverage the power of the cloud
Get started with Databricks using SQL and Python in either Microsoft Azure or AWS
Understand the underlying technology, and how the cloud and Spark fit into the bigger picture
See how these tools are used in the real world
Run basic analytics, including machine learning, on billions of rows at a fraction of a cost or free

This book is for data engineers, data scientists, and cloud architects who want or need to run advanced analytics in the cloud. It is assumed that the reader has data experience, but perhaps minimal exposure to Apache Spark and Azure Databricks. The book is also recommended for people who want to get started in the analytics field, as it provides a strong foundation.

About the Author

Robert Ilijason is a 20-year veteran in the business intelligence (BI) segment. He has worked as a contractor for some of Europe’s biggest companies and has conducted large-scale analytics projects within the areas of retail, telecom, banking, government, and more. Robert has seen his share of analytic trends come and go over the years, but unlike most of them, he strongly believes that Apache Spark in the cloud, especially with Azure Databricks, is a game changer.

In this Book

Introduction to Large-Scale Data Analytics
Spark and Databricks
Getting Started with Databricks
Workspaces, Clusters, and Notebooks
Getting Data into Databricks
Querying Data Using SQL
The Power of Python
ETL and Advanced Data Wrangling
Connecting to and from Databricks
Running in Production
Bits and Pieces

FREE ACCESS

Book Practical Machine Learning with Spark: Uncover Apache Spark's Scalable Performance with High-Quality Algorithms Across NLP, Computer Vision and ML

Course Data Engineering on Microsoft Azure: Databricks

(63)

Course Azure Databricks & Data Pipelines

(42)

PEOPLE WHO VIEWED THIS ALSO VIEWED THESE

Course Python Development: Getting Started with Programming in Python

(2165)

Book Beginning DAX with Power BI: The SQL Pro's Guide to Better Business Intelligence

Book Beginning Apache Spark 3

Get Started

Sharpen your skills. Upgrade your career. Find the right learning path for you, based on your role and skills. Take part in hands-on practice, study for a certification, and much more - all personalized for you.

*Not included: Compliance, Leadership Development Program content, and Engineering books

Your content + our content + our platform = a path to learning success

Using our learning experience platform, Percipio, your learners can engage in custom learning paths that can feature curated content from all sources.

Learn More

Aspire to something bigger

Aspire Journeys are guided learning paths that set you in motion for career success.

Browse Aspire Journeys

Explore a world of live learning with Global Knowledge

Choose from convenient delivery formats to get the training you and your team need - where, when and how you want it.

Browse Live Learning

IT Skills and Salary Report

ESG Impact Report

Beginning Apache Spark Using Azure Databricks: Unleashing Large Cluster Analytics in the Cloud

In this Book

YOU MIGHT ALSO LIKE

PEOPLE WHO VIEWED THIS ALSO VIEWED THESE