Data Lakes

  • 3h 17m
  • Anne Laurent, Cédrine Madera (eds), Dominique Laurent
  • John Wiley & Sons (US)
  • 2020

The concept of a data lake is less than 10 years old, but they are already hugely implemented within large companies. Their goal is to efficiently deal with ever-growing volumes of heterogeneous data, while also facing various sophisticated user needs. However, defining and building a data lake is still a challenge, as no consensus has been reached so far. Data Lakes presents recent outcomes and trends in the field of data repositories. The main topics discussed are the data-driven architecture of a data lake; the management of metadata – supplying key information about the stored data, master data and reference data; the roles of linked data and fog computing in a data lake ecosystem; and how gravity principles apply in the context of data lakes. A variety of case studies are also presented, thus providing the reader with practical examples of data lake management.

About the Author

Anne Laurent is a Full Professor at the University of Montpellier, France, and teaches at the Polytech Montpellier Engineering School. She is also a member of the LIRMM laboratory at the University of Montpellier, where she works on the semantic web, data mining, data warehousing, data lakes and fuzzy logic. Dominique Laurent is Emeritus Professor at Cergy-Pontoise University, France. He is a member of the ETIS-CNRS laboratory and his main research interests include database theory, database updates, data mining and data warehousing. Cédrine Madera is an Executive Information Architect at IBM, France. She is a doctor in Data Science and, in close collaboration with the world of academics, she works on the evolution of information systems.

In this Book

  • Introduction to Data Lakes—Definitions and Discussions
  • Architecture of Data Lakes
  • Exploiting Software Product Lines and Formal Concept Analysis for the Design of Data Lake Architectures
  • Metadata in Data Lake Ecosystems
  • A Use Case of Data Lake Metadata Management
  • Master Data and Reference Data in Data Lake Ecosystems
  • Linked Data Principles for Data Lakes
  • Fog Computing
  • The Gravity Principle in Data Lakes
  • Glossary
  • References
SHOW MORE
FREE ACCESS