Enabling the Open Source AI Native Ecosystem with an AI Specific Computing Framework

By April 6, 2020Blog

Guest Author: Zhipeng Huang Principle Engineer, Huawei Technologies Huawei’s Representative on the LF AI Foundation Technical Advisory Council

Meet MindSpore: Huawei’s Open Source AI Computing Framework 

We are very excited to announce that Huawei is open sourcing MindSpore, an AI computing framework. MindSpore was developed by Huawei with the goal of implementing on-demand collaboration across the cloud-edge-device. It provides unified APIs and end-to-end AI capabilities for model development, execution, and deployment in all scenarios.

Using a distributed architecture (Figure 1), MindSpore leverages a native automatically differentiable programming paradigm and new AI native execution modes to achieve better resource efficiency, security, and trustworthiness. Meanwhile, MindSpore makes full use of the computing power of Ascend AI processors and lowers the entry requirements of industry AI development, bringing inclusive AI faster to reality.

Figure 1: MindSpore High Level Architecture

MindSpore is designed to provide development experience with friendly design and efficient execution for the data scientists and algorithmic engineers, native support for Ascend AI processor, and software hardware co-optimization. 

Our goal with open sourcing MindSpore is to provide the global community of AI open source with a computing framework that will further advance the development and enrichment of the AI software/hardware application ecosystem. 

Building AI Native Programming ecosystem with emphasis on Interoperability

With recent development of the Pyro project, an incubation project of LF AI Foundation, Julia and MindSpore, it has become evident that AI native programming is the next trend in deep learning framework development. Gone with the old days that mathematical libraries were added to existing engineering toolsets, data scientists will more and more likely to use their familiar toolset with more engineering capability added. AI developers should be able to write the models as the mathematical format without a steep learning curve of software engineering.

In order to build the new AI native programming ecosystem, interoperability is a critical issue to be solved. At the northbound (Figure 2 – red blocks), other than IR, interop for things like crypto, type system, metadata are also needed to be addressed. At the southbound  (Figure 2 – purple blocks), in addition to heterogeneous computing hardware that needs to be supported, storage interoperability should also be considered.

Figure 2: Interoperability Proposal to be discussed in LF AI’s Technical Advisory Council

MindSpore community will work with the LF AI Foundation community and more specifically the Technical Advisory Council through its ML Workflow effort to address interoperability issues. We also plan to engage with the ONNX community (ONNX is a Graduate level project in LF AI Foundation) to make sure that by exporting ONNX models, developers could utilize MindSpore in more scenarios.

Working with Kubeflow

MindSpore is also utilizing cloud native ecosystem for deployment and management. With the recent Kubeflow 1.0 and upcoming Kubernetes 1.18 release, we can experiment with the latest cloud native computing technology for agile MLOps.

Figure 3: MindSpore and the Cloud Native Ecosystem

In order to take advantage of the prowess of Kubeflow and Kubernetes, the first thing we did is to write the operator for MindSpore (called, ms-operator), and also define a MindSpore CRD (Custom Resource Definition). The current version of ms-operator is based on an early version of PyTorch Operator  and TF Operator .

The implementation of ms-operator contains the specification and implementation of MSJob custom resource definition. We will demonstrate running a walkthrough of making ms-operator image, creating a simple msjob on kubernetes with MindSpore “`0.1.0-alpha“` image. The whole MindSpore community is still working on implementing distributed training on different backends so that users can create and manage msjobs like other built-in resources on Kubernetes in the near future.

The MindSpore community is driving to collaborate with the Kubeflow community as well as making the ms-operator more complex, well-organized and up-to-date. All these components make it easy for machine learning engineers and data scientists to leverage cloud assets (public or on-premise) for machine learning workloads.