Meetup: How to Collaborate with Data Lake Management Communities
Ivett Sanchez
Tue, 03/02/2021 - 18:01
Fecha de inicio
11-Mar-2021
Sede
Online
Website
https://www.crowdcast.io/e/data-lake-management/register
Talk 1: Modernizing Apache Hive Metastore for the Next Decade by Feng Lu
Talk 2: How Delta Lake Address Data Lake Challenges by Ajay Singh
Moderator: Chris Crosbie----------
Abstracts----------
Modernizing Apache Hive Metastore for the Next Decade by Feng Lu
The Apache Hive project was introduced in 2010 and has enjoyed widespread popularity in the community for a decade. Being the de facto technical metadata standard, Apache Hive Metastore is a critical backbone for building data lake applications. The recent need of providing ACID properties, time travel, end to end security, and streaming support on tabular datasets renders the Hive Metastore data model inadequate. We present a number of open source initiatives at Google that aim to uplift and rejuvenate Hive Metastore for the next decade with design rationales explained. Namely, first-class versioned metadata support for new table formats (e.g., Iceberg, Delta), native gRPC interface for Hive Metastore API access, and running Hive Metastore on cloud-native databases like Cloud Spanner. We will wrap up the talk with development progress and call for community contributions.
How Delta Lake Address Data Lake Challenges by Ajay Singh
Delta Lake is an open source storage layer that brings reliability to data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming …