Carving out a niche in the rapidly evolving arena of AI (Artificial Intelligence) isn’t easy – especially for an organization that doesn’t have first-mover advantage. With a clear mandate from Japan’s Ministry of Education, Culture, Sports, Science and Technology (MEXT) in 2016 however, the country’s largest, comprehensive research institution sought to rapidly establish itself via the RIKEN Center for Advanced Intelligence Project (AIP Center).
Carving out a niche in the rapidly evolving arena of AI (Artificial Intelligence) isn’t easy – especially for an organization that doesn’t have first-mover advantage. With a clear mandate from Japan’s Ministry of Education, Culture, Sports, Science and Technology (MEXT) in 2016 however, the country’s largest, comprehensive research institution sought to rapidly establish itself via the RIKEN Center for Advanced Intelligence Project (AIP Center). By emphasizing fundamental AI research and its societal impact, in tandem with a Japan-centric thrust for technology transfer, the AIP Center is playing to its strengths – in the latter case, for example, in making use of public data specific to Japan in addressing unique social issues from medical diagnosis to urban infrastructure management.
The AIP Center’s entry onto the international stage of AI research has been expedited and facilitated by its partnership with Fujitsu, who have in turn partnered with Univa, in the delivery of the RAIDEN (Riken AIp Deep learning ENvironment) supercomputer. When ranked last November for SC17 in Denver, RAIDEN achieved a modest Top500 ranking, along with an impressive Green500 ranking – placing it amongst the Top 10 Green500 finishers, of which 7 are similarly located in this nation known for prizing energy efficiency. Owing to a significant and recent upgrade, the now 54 PFLOPS RAIDEN supercomputer is expected to improve its Top500 ranking while likely maintaining its Green500 stature in the updated versions of the lists anticipated in advance of ISC18.
Although the increase in the number of NVIDIA DGX-1 servers is partially responsible for RAIDEN’s enhanced performance, it’s also the adoption of the latest NVIDIA Tesla V100 GPUs that’s key to the 13.5X speedup. Motivated by anything but bragging rights, this upgraded instantiation of RAIDEN acknowledges that a significant computational resource is demanded by the AIP Center’s international research collaborations that need to push the bounds of AI in every sense – for example, as researchers apply algorithms having architecturally complex and deep neural networks to extremely large volumes of data.
Frustrated by past experiences with supercomputers, wherein an upgrade might render results irreproducible, researchers at RIKEN sought to seize the opportunity presented by containerization when it came to RAIDEN – i.e., “ … for storing past research assets for each software …” to quote unit leader Kazuki Yoshizoe of the Search and Parallel Computing Unit from a Fujitsu case study.
This predisposition towards containerization in the operational use of GPU-enabled RAIDEN introduces a corresponding and compelling demand for Univa Grid Engine. Aside from being the de facto standard for enterprise-class deployments of shared computational infrastructures for managed HPC and AI workloads, Univa Grid Engine delivers industry-leading integrations with NVIDIA GPUs and Docker containers:
- Through the differentiating abstraction of resource maps (RSMAPS), isolated to densely packed ‘collections’ of GPUs are identified, used, monitored and reported upon for their computational capabilities. Thus Deep Learning frameworks such as distributed TensorFlow can employ GPUs in executing applications and workflows.
- Through use of (optionally cached) images from the public Docker Hub or a private registry, containerized applications execute along traditional lines – meaning that they are controlled, limited, accounted for, etc., in precisely the same fashion as traditional (i.e., non-containerized) applications.
In the case of workload management for RAIDEN, Univa Grid Engine delivers combined support for GPUs and Docker containers – meaning that AIP Center researchers can run their Deep Learning applications within Docker containers that make abstracted use of ‘external’ GPUs via device mappings (i.e., between a container and a physical host). To ensure highly reproducible results, these mappings can be bound in a fashion that both optimizes and guarantees allocations – even in the case of this shared environment where a multitude of AI applications compete for RAIDEN’s resources in real time.
In support of RIKEN’s international community of researchers, the AIP Center’s vision for RAIDEN as a shared and substantial platform for next-generation AI research is being realized in great measure. Whereas the AIP Center partnered with Fujitsu to develop and deliver an integrated solution for Deep Learning, Fujitsu turned to Univa ensure that RAIDEN would be utilized effectively and efficiently in practice; in selecting Univa Grid Engine for RAIDEN, the AIP Center’s research needs are addressable today and future proofed through adoption of a workload-management solution that is both GPU-enabled and containerization-native.
Because we’re finding that organizations such as the AIP Center are expressing an increasingly common set of requirements for GPU-enabled and containerization-native workload management, you can expect to hear much more from Univa on this topic in the countdown to ISC18 and beyond.
Add new comment