Amundsen Joins LF AI as New Incubation Project

By August 11, 2020Blog

LF AI Foundation (LF AI), the organization building an ecosystem to sustain open source innovation in artificial intelligence (AI), machine learning (ML), and deep learning (DL), today is announcing Amundsen as its latest Incubation Project. 

Amundsen is a data discovery and metadata engine for improving the productivity of data analysts, data scientists and engineers when interacting with data. It does that today by indexing data resources (tables, dashboards, streams, etc.) and powering a page-rank style search based on usage patterns (e.g. highly queried tables show up earlier than less queried tables). Think of it as Google search for data. The project is named after Norwegian explorer Roald Amundsen, the first person to discover the South Pole. Amundsen was released and open sourced by Lyft

Dr. Ibrahim Haddad, Executive Director of LF AI, said: “We are very excited to welcome Amundsen to LF AI and help it thrive in a neutral, vendor-free environment under an open governance model. With the addition of Amundsen, we are increasing the number of hosted projects under the Data category and look forward to tighter collaboration between our data projects and all other projects to drive innovation in data, analytics, and AI open source technologies.” LF AI supports projects via a wide range of services, and the first step is joining as an Incubation Project. 

Mark Grover, co-creator of Amundsen, said: “Becoming a part of the LF AI Foundation is a big milestone for the project in its journey towards becoming the de-facto open-source data discovery and metadata engine. It’s been amazing to see the adoption of Amundsen at Lyft, as well the growth of its open source community, which now has over 750 members. I am excited to see the project’s continued growth and success with the support of the LF AI Foundation.” 

Amundsen, which is published under the Apache License, Version 2.0, includes three microservices, one data ingestion library and one common library (full code):

  • amundsen: Central repo for Amundsen.
  • amundsenfrontendlibrary: Frontend service which is a Flask application with a React frontend.
  • amundsensearchlibrary: Search service, which leverages Elasticsearch for search capabilities, is used to power frontend metadata searching.
  • amundsenmetadatalibrary: Metadata service, which leverages Neo4j or Apache Atlas as the persistent layer, to provide various metadata.
  • amundsendatabuilder: Data ingestion library for building metadata graph and search index. Users could either load the data with a python script with the library or with an Airflow DAG importing the library.
  • amundsencommon: Amundsen Common library holds common codes among microservices in Amundsen.

The project’s growing user community now includes Lyft, ING, Square, Workday, Asana, iRobot, Edmunds.com, and many more. Amundsen started off with enabling discovery and exploration of data sets, but since then has added dashboards and people in its metadata graph. It integrates with a large ecosystems of data stores, dashboarding tools, and orchestration tools (like Airflow). You can learn more about Amundsen at amundsen.io.

Bolke de Bruin, VP Engineering Advanced Analytics at ING Wholesale Banking, said: “At ING we are early adopters of and contributors to Amundsen, the Google search for data, created by Lyft. When we got in touch with the Lyft team in September 2018, we were immediately captured by their vision. In the metadata space many applications focus on data governance. This is important, but only of limited use for data scientists, analysts and engineers. Amundsen is focused on increasing the productivity of these data users and reducing the friction they face while trying to find and understand the data they work with. At ING, Amundsen is part of our data analytics platform, which targets 50% of the company as its users. Amundsen is key in reaching this target as it’s lowering the barrier to entry. 500 users make use of Amundsen daily and this number is continuously growing.”

Chris Martin, VP of Data, Science & ML at Lyft, said: “Amundsen has become a crucial part of data science and analyst workflows, being used by over a thousand users every month. It has significantly improved productivity and trust in data for data users at Lyft.”

LF AI will support the neutral open governance for Amundsen to help foster the growth of the project. Check out the User Guide to start working with Amundsen today. Learn more about Amundsen on their website and be sure to join the Amundsen-Announce and Amundsen-Technical-Discuss mail lists to join the community and stay connected on the latest updates. 

A warm welcome to Amundsen! We look forward to the project’s continued growth and success as part of the LF AI Foundation. To learn about how to host an open source project with us, visit the LF AI website.

Amundsen Key Links

LF AI Resources