All Posts By

Jacqueline Serafin

LF AI 2020 Mid Year Review

By Blog

2020 has been a busy year for the LF AI Foundation (LF AI) and we are thrilled to see the continued enthusiasm among the overall community and the growth of our hosted technical projects. With half the year behind us, we’re taking a moment to reflect on the key highlights. 

Members

LF AI launched two years ago with ten members, and is now at a total of 24 members across our Premier, General, and Associate levels. In the first half of 2020, we’ve seen extra momentum in our Associate member category, with several educational institutions joining us; including Montreal AI Ethics Institute, Pranveer Institute of Technology, and Penn State Great Valley. We also welcomed two non-profit organizations, AI for People and Ambianic.ai who have both been very active among the LF AI community right away.

It’s been great to see a diverse group of companies getting involved within LF AI across various industries. We welcome those interested in contributing to the support of open source projects within the artificial intelligence (AI), machine learning (ML), and deep learning (DL) space to learn more about membership opportunities here

Technical Projects 

Our technical project portfolio grew to twelve projects; of which three are Graduated and nine are Incubating. At the end of June, the LF AI Technical Advisory Council (TAC) approved three additional Incubating projects in the Trusted AI space; these projects are undergoing onboarding into the Foundation and will be formally announced soon, stay tuned! The TAC is continually working to bring in new open source projects, if you are interested in hosting a project with LF AI, check out the proposal process here and email info@lfai.foundation to further discuss.

Interactive Landscape

The launch of the LF AI Interactive Landscape has continued to be a great tool to gain insights into how LF AI projects, among many others, fit into the space of open source AI, ML, and DL. As of the end of June, the landscape covers 248 projects coming from over 130 founding organizations universities. These projects collectively earned over 1.4 million GitHub Stars, and cover over 450 millions lines of code coming from over 30 thousand developers! Explore the landscape and please reach out to help us expand it with your own open source project or let us know of other projects that should be included by emailing info@lfai.foundation.

Initiatives

We are excited to have seen participation increase in two key initiatives. The ML Workflow & Interop Committee, is focused on defining an ML Workflow and promoting cross project integration and interoperability. The Trusted AI Committee is focused on creating policies, guidelines, tooling, and use cases by industry in this very important space. Both of these committees are open for participation and we welcome anyone interested to join the conversations by subscribing to the mail lists or attending an upcoming meeting; check out their wiki pages for more information. 

Events 

Despite the challenges that COVID-19 has presented with in person gatherings, our community did not let that prevent them from moving forward with their planned events and instead pivoted to virtual formats. There have been two LF AI Days this year; the first being focused on an ONNX Community Virtual Meetup, followed by a Virtual LF AI Day EU for those based in that region. LF AI Days are regional, one-day events hosted and organized by local members with support from LF AI and its projects. Visit our LF AI Events page for more details on upcoming events and be sure to join us for one soon!

Community

The LF AI community continues to grow! If you haven’t already, check out below a few ways to stay connected with LF AI:

Onward

We are excited to see what the second half of 2020 brings and how LF AI can influence the AI, ML, and DL space; we hope you will be a part of the journey! Check out our How to Get Involved Guide or email us at info@lfai.foundation for any questions on how to participate.

LF AI Resources

sparklyr 1.3.0 Now Available!

By Blog

Sparklyr, an LF AI Foundation Incubation Project, has released version 1.3.0! Sparklyr is an R Language package that lets you analyze data in Apache Spark, the well-known engine for big data processing, while using familiar tools in R. The R Language is widely used by data scientists and statisticians around the world and is known for its advanced features in statistical computing and graphics. 

In version 1.3.0, sparklyr adds a variety of improvements; highlights include:

  • Now supports seamless integration of Spark higher-order functions with R (similar to how dplyr allows R users to compose clear and concise data-manipulation verbs instead of long SQL queries)
  • After seeing popular demand for Apache Avro functionalities in sparklyr, spark_read_avro, spark_write_avro, sdf_from_avro, and sdf_to_avro methods are implemented to make working with Apache Avro simpler for sparklyr users (context: Apache Avro is a popular data serialization format that combines flexibility of JSON schema definition with efficiency of binary serialization of data columns)
  • It is now also possible to run user-defined R serialization and deserialization procedures on Spark worker nodes through sparklyr
  • As usual, creating new features wasn’t the only focus for the sparklyr 1.3 release. There were also a number of crucial bug fixes (as outlined in https://github.com/sparklyr/sparklyr/pull/2550)

The power of open source projects is the aggregate contributions originating from different community members and organizations that collectively help drive the advancement of the projects and their roadmaps. The sparklyr community is a great example of this process and was instrumental in producing this release. The sparklyr team wanted to give a special THANK YOU to the following community members for their contributions via pull requests (listed in chronological order):

Contributions take many forms, roadmap input for sparklyr 1.3 from Javier Luraschi ([#2434 and #2552). And great insight from @mattpollock and @benmwhite on several issues (#1773, #2514). Truly a great team effort for this release! 

To learn more about the sparklyr 1.3.0 release, check out the full release notes. Want to get involved with sparklyr? Be sure to join the sparklyr-Announce and sparklyr Technical-Discuss mailing lists to join the community and stay connected on the latest updates. 

Congratulations to the sparklyr team and we look forward to continued growth and success as part of the LF AI Foundation! To learn about hosting an open source project with us, visit the LF AI Foundation website.

sparklyr Key Links

LF AI Resources

Marquez Joins LF AI as New Incubation Project

By Blog

The LF AI Foundation (LF AI), the organization building an ecosystem to sustain open source innovation in artificial intelligence (AI), machine learning (ML), and deep learning (DL), today is announcing Marquez as its latest Incubation Project. Marquez is an open source metadata service for the collection, aggregation, and visualization of a data ecosystem’s metadata. It maintains the provenance of how datasets are consumed and produced, provides global visibility into job runtime and frequency of dataset access, centralization of dataset lifecycle management, and much more.

“The Marquez community is excited to join the LF AI. This is the next step for Marquez to become an integral part of the wider data community and be the standard for lineage and metadata collection” said Julien Le Dem, CTO of Datakin. “We are very pleased to welcome Marquez to LF AI. Machine learning requires high quality data pipelines and Marquez gives visibility into data quality, enables reproducibility, facilitates operations, and builds accountability and trust,” said Dr. Ibrahim Haddad, Executive Director of LF AI. “We look forward to supporting this project and helping it to thrive under a neutral, vendor-free, and open governance.” LF AI supports projects via a wide range of benefits; and the first step is joining as an Incubation Project. Full details on why you should host your open source project with LF AI are available here.

Marquez enables highly flexible data lineage queries across all datasets, while reliably and efficiently associating (upstream, downstream) dependencies between jobs and the datasets they produce and consume.

Marquez is a modular system and has been designed as a highly scalable, highly extensible platform-agnostic solution for metadata management. It consists of the following system components:

  • Metadata Repository: Stores all job and dataset metadata, including a complete history of job runs and job-level statistics (i.e. total runs, average runtimes, success/failures, etc).
  • Metadata API: RESTful API enabling a diverse set of clients to begin collecting metadata around dataset production and consumption.
  • Metadata UI: Used for dataset discovery, connecting multiple datasets and exploring their dependency graph.

Marquez’s data model emphasizes immutability and timely processing of datasets. Datasets are first-class values produced by job runs. A job run is linked to versioned code, and produces one or more immutable versioned outputs. Dataset changes are recorded at different points in job execution via lightweight API calls, including the success or failure of the run itself.

Learn more about Marquez here and be sure to join the Marquez-Announce and Marquez-Technical-Discuss mail lists to join the community and stay connected on the latest updates. 

A warm welcome to Marquez and we look forward to the project’s continued growth and success as part of the LF AI Foundation. To learn about how to host an open source project with us, visit the LF AI website.

Marquez Key Links

LF AI Resources

Adlik 0.1.0 Release Now Available!

By Blog

Adlik, an LF AI Foundation Incubation-Stage Project, has released version 0.1.0. We’re thrilled to see a release from this community who has been hard at work the past few months! Adlik is a toolkit for accelerating deep learning inference, which provides an overall support for bringing trained models into production and eases the learning curves for different kinds of inference frameworks. In Adlik, Model Optimizer and Model Compiler delivers optimized and compiled models for a certain hardware environment, and Serving Engine provides deployment solutions for cloud, edge and device.

In version 0.1.0, Adlik enhances features, increases useability, and addresses miscellaneous bug fixes. A few of the release highlights include the following: 

  • Model Compiler
    • A new framework which is easy to expand and maintain
    • Compilation of models trained from Keras, Tensorflow, and Pytorch for better execution on CPU/GPU
  • Model Optimizer
    • Multi nodes multi GPUs training and pruning
    • Configurable implementation of filter pruning to achieve smaller size of inference models
    • Small batch dataset quantization for TF-Lite and TF-TRT
  • Inference Engine
    • Management of multi models and multi versions
    • HTTP/GRPC interfaces for inference service
    • Runtime scheduler that supports scheduling of multi model instances
    • Integration of multiple DL inference runtime, including TensorFlow Serving, OpenVINO, TensorRT and TF Lite
    • Integration of dlib to support ML runtime

This release also contains a Benchmark Test Framework for DL Model, which enables a standardized benchmark test for performance of models running in the same hardware environment with different runtime supported by Adlik. In this framework, the whole testing pipeline is auto executed with a containerized solution. 

The Adlik team expressed a special thank you to contributors from ZTE, China Mobile, and China Unicom for their extra hard work.

The Adlik Project invites you to adopt or upgrade to version 0.1.0, and welcomes feedback. To learn more about the Adlik 0.1.0 release, check out the full release notes. Want to get involved with Adlik? Be sure to join the Adlik-Announce and Adlik Technical-Discuss mailing lists to join the community and stay connected on the latest updates. 

Congratulations to the Adlik team! We look forward to continued growth and success as part of the LF AI Foundation. To learn about hosting an open source project with us, visit the LF AI Foundation website.

Adlik Key Links

LF AI Resources

Newly Elected ONNX Steering Committee Announced!

By Blog

Author(s): The ONNX Steering Committee

The ONNX community continues to grow with new tools supporting the spec and nearly two hundred individuals from one hundred organizations attending the April 2020 community meeting. Along with the strong growth of this open source project, we are excited to announce that the governance structure is working well and elections have resulted in newly appointed steering committee members. This is another important step to ensure an open, adaptive, sustainable future for the ONNX project.

The ONNX steering committee as of June 1st are: 

The community expresses sincere gratitude to the three former members, both for exemplary service as well as continuing participation and support for ONNX spec and community: 

The past and present steering committee members wish to thank all those who self-nominated as well as those who voted in the election. Solid contributions to SIGS, Working Groups, and Community Meetings continue to be the best way to grow eminence in the ONNX community. For those who plan to self-nominate in next year’s election, participation is essential.   Also, community outreach to other projects in the LF AI Foundation and contributions to defining the ONNX Roadmap are encouraged.

ONNX is an open format to represent and optimize deep learning and machine learning models that deploy and execute on diverse hardware platforms and clouds. ONNX allows AI developers to more easily move AI models between tools that are part of trusted AI/ML/DL workflows. The ONNX community was established in 2017 to create an open ecosystem for interchangeable models, and quickly grew as tool vendors and enterprises adopted ONNX for their products and internal processes. Support for ONNX spec as an industry standard continues to grow with the support of contributors from across geographies and industry sectors. ONNX is a graduated project of the LF AI Foundation under multi-vendor open governance, in accordance with industry best practice. ONNX community values are: Open, welcoming, respectful, transparent, accessible, meritorious, and speedy. In accordance with our ONNX community principle of being welcoming, all ONNX Steering Committee meetings are open to the community to attend. We welcome your contributions to ONNX.

Congrats to everyone involved and thank you for your contributions to the ONNX project!

The ONNX Steering Committee

ONNX Key Links

LF AI Resources

ONNX 1.7 Now Available!

By Blog

ONNX, an LF AI Foundation Graduated Project, has released version 1.7 and we’re thrilled to see this latest set of improvements. ONNX is an open format to represent deep learning models. With ONNX, AI developers can more easily move models between state-of-the-art tools and choose the combination that is best for them. 

In version 1.7, you can find the following:

  • Model training introduced as a technical preview, which expands ONNX beyond its original inference capabilities 
  • New and updated operators to support more models and data types
  • Functions are enhanced to enable dynamic function body registration and multiple operator sets
  • Operator documentation is also updated with more details to clarify the expected behavior

To learn more about the ONNX 1.7 release, check out the full release notes. Want to get involved with ONNX? Be sure to join the ONNX Announce and ONNX Technical-Discuss mailing lists to join the community and stay connected on the latest updates. 

Congratulations to the ONNX team and we look forward to continued growth and success as part of the LF AI Foundation! To learn about hosting an open source project with us, visit the LF AI Foundation website.

ONNX Key Links

LF AI Resources

Angel 3.1.0 Release Now Available!

By Blog

Angel, an LF AI Foundation Graduated Project, has released version 3.1.0 and we’re thrilled to see lots of momentum within this community. The Angel Project is a high-performance distributed machine learning platform based on Parameter Server, running on YARN and Apache Spark. It is tuned for performance with big data and provides advantages in handling higher dimension models. It supports big and complex models with billions of parameters, partitions parameters of complex models into multiple parameter-server nodes, and implements a variety of machine learning algorithms using efficient model-updating interfaces and functions, as well as flexible consistency models for synchronization.

In version 3.1.0, Angel adds a variety of improvements, including: 

  • Features in graph learning with the trend of graph data structure adopted for many applications such as social network analysis and recommendation systems
  • Publishing a collection of well implemented graph algorithms such as traditional learning, graph embedding, and graph deep learning – These algorithms can be used directly in the production model by calling with simple configurations
  • Providing an operator API for graph manipulations including building graph, and operating the vertices and edges
  • Enabling the support of GPU devices within the PyTorch-on-Angel running mode – With this feature it’s possible to leverage the hardwares to speed up the computation intensive algorithms

The Angel Project invites you to adopt or upgrade Angel of version 3.1.0 in your application, and welcomes feedback. To learn more about the Angel 3.1.0 release, check out the full release notes. Want to get involved with Angel? Be sure to join the Angel-Announce and Angel Technical-Discuss mailing lists to join the community and stay connected on the latest updates. 

Congratulations to the Angel team and we look forward to continued growth and success as part of the LF AI Foundation! To learn about hosting an open source project with us, visit the LF AI Foundation website.

Angel Key Links

LF AI Resources

Thank You IBM & ONNX for a Great LF AI Day

By Blog

A big thank you to IBM and ONNX for hosting a great virtual meetup! The LF AI Day ONNX Community Virtual Meetup was held on April 9, 2020 and was a great success with close to 200 attendees joining live. 

The meetup included ONNX Community updates, partner/end-user stories, and SIG/WG updates. The virtual meetup was an opportunity to connect with and hear from people working with ONNX across a variety of groups. A special thank you to Thomas Truong and Jim Spohrer from IBM for working closely with the ONNX Technical Steering Committee, SIG’s, and Working Groups to curate the content. 

Missed the meetup? Check out the recordings at bit.ly/lfaiday-onnxmeetup-040920.

This meetup took on a virtual format but we look forward to connecting again at another event in person soon. LF AI Day is a regional, one-day event hosted and organized by local members with support from LF AI, its members, and projects. If you are interested in hosting an LF AI Day please email info@lfai.foundation to discuss.

ONNX, an LF AI Foundation Graduated Project, is an open format to represent deep learning models. With ONNX, AI developers can more easily move models between state-of-the-art tools and choose the combination that is best for them. Be sure to join the ONNX Announce mailing list and ONNX Gitter to join the community and stay connected on the latest updates. 

ONNX Key Links

LF AI Resources

sparklyr 1.2.0 Now Available!

By Blog

sparklyr, an LF AI Foundation Incubation Project, has released version 1.2.0 and we’re excited to see a great release with contributions from several members of the community. sparklyr is an R Language package that lets you analyze data in Apache Spark, the well-known engine for big data processing, while using familiar tools in R. The R Language is widely used by data scientists and statisticians around the world and is known for its advanced features in statistical computing and graphics. 

In version 1.2.0, sparklyr adds a variety of improvements, including: 

  • sparklyr now supports Databricks Connect 
  • A number of interop issues with Spark 3.0.0-preview were fixed
  • The `registerDoSpark` method was implemented to allow Spark to be used as a `foreach` parallel backend in Sparklyr (see registerDoSpark.Rd)
  • And more…A complete list of changes can be found in sparklyr 1.2.0 section of the NEWS.md file: sparklyr-1.2.0

The power of open source projects is the aggregate contributions originating from different community members and organizations that collectively help drive the advancement of the projects and their roadmaps. The sparklyr community is a great example of this process and was instrumental in producing this release. A special THANK YOU goes out to the following community members for their contributions of commits and pull request reviews!

To learn more about the sparklyr 1.2.0 release, check out the full release notes. Want to get involved with sparklyr? Be sure to join the sparklyr-Announce and sparklyr Technical-Discuss mailing lists to join the community and stay connected on the latest updates. 

Congratulations to the sparklyr team and we look forward to continued growth and success as part of the LF AI Foundation! To learn about hosting an open source project with us, visit the LF AI Foundation website.

sparklyr Key Links

LF AI Resources

ForestFlow Joins LF AI as New Incubation Project

By Blog

The LF AI Foundation (LF AI), the organization building an ecosystem to sustain open source innovation in artificial intelligence (AI), machine learning (ML), and deep learning (DL), today is announcing ForestFlow as its latest Incubation Project. ForestFlow is a scalable policy-based cloud-native machine learning model server. ForestFlow strives to strike a balance between the flexibility it offers data scientists and the adoption of standards while reducing friction between Data Science, Engineering and Operations teams. ForestFlow was released and open sourced by Dreamworks.

“We are very pleased to welcome ForestFlow to LF AI. ForestFlow provides an easy way to deploy ML models to production and realize business value on an open source platform that can scale as the user’s projects and requirements scale,” said Dr. Ibrahim Haddad, Executive Director of LF AI. “We look forward to supporting this project and helping it to thrive under a neutral, vendor-free, and open governance.” LF AI supports projects via a wide range of benefits; and the first step is joining as an Incubation Project. 

Ahmad Alkilani, Principal Architect and developer of ForestFlow at DreamWorks Animation, said, “We developed ForestFlow in response to our need to move ML models into production that affected the scheduling and placement of rendering jobs and the throughput of our rendering pipeline which has a material impact to our bottom line. Our focus was on maintaining our own teams’ agility and keeping ML models fresh in response to changes in data, features, or simply the production tools that historical data was associated with. Another pillar for developing ForestFlow was the openness of the solution we chose. We were looking to minimize vendor lock-in having a solution that was amenable to on-premise and cloud deployments all the same while offloading deployment complexities from the job description of a Data Scientist. We want our team to focus on extracting the most value they can out of the data we have and not have to worry about operational concerns. We also needed a hands-off approach to quickly iterate and promote or demote models based on observed metrics of staleness and performance. With these goals in mind, we also realize the value of open source software and the value the Linux Foundation brings to any project and specifically LF AI in this space. DreamWorks Animation is pleased that LF AI will manage the neutral open governance for ForestFlow to help foster the growth of the project.”

Continuous deployment and lifecycle management of Machine Learning/Deep Learning models is currently widely accepted as a primary bottleneck for gaining value out of ML projects. Hear from ForestFlow about why they set out to create this project: 

  • We wanted to reduce friction between our data science, engineering and operations teams
  • We wanted to give data scientists the flexibility to use the tools they wanted (H2O, TensorFlow, Spark export to PFA etc..)
  • We wanted to automate certain lifecycle management aspects of model deployments like automatic performance or time-based routing and retirement of stale models
  • We wanted a model server that allows easy A/B testing, Shadow (listen only) deployments and Canary deployments. This allows our Data Scientists to experiment with real production data without impacting production and using the same tooling they would when deployment to production.
  • We wanted something that was easy to deploy and scale for different deployment scenarios (on-prem local data center single instance, cluster of instances, Kubernetes managed, Cloud native etc..)
  • We wanted the ability to treat inference requests as a stream and log predictions as a stream. This allows us to test new models against a stream of older infer requests.
  • We wanted to avoid the “super-hero” data scientist that knows how to dockerize an application, apply the science, build an API and deploy to production. This does not scale well and is difficult to support and maintain.
  • Most of all, we wanted repeatability. We didn’t want to reinvent the wheel once we had support for a specific framework.

ForestFlow is policy-based to support the automation of Machine Learning/Deep Learning operations which is critical to scaling human resources. ForestFlow lends itself well to workflows based on automatic retraining, version control, A/B testing, Canary Model deployments, Shadow testing, automatic time or performance-based model deprecation and time or performance-based model routing in real-time. The aim for ForestFlow is to provide data scientists a simple means to deploy models to a production system with minimal friction accelerating the development to production value proposition. Check out the quickstart guide to get an overview of setting up ForestFlow and an example on inference. 

Learn more about ForestFlow here and be sure to join the ForestFlow-Announce and ForestFlow-Technical-Discuss mail lists to join the community and stay connected on the latest updates. 

A warm welcome to ForestFlow and we look forward to the project’s continued growth and success as part of the LF AI Foundation. To learn about how to host an open source project with us, visit the LF AI website.

ForestFlow Key Links

LF AI Resources