IBM and Nvidia teach supercomputers to learn

IBM and Nvidia have collaborated on a deep learning tool to “help train computers to think and learn in more human-like ways at a faster pace”, said IBM.

IBM S822LC server interior

In this case, ‘deep learning’ is being used to mean crunching through vast tracts of data to detect and rank its most important aspects. IBM sees it being used for facial recognition to cut fraud in banking, self-driving vehicles and in fully-machine retail call centres.

The result is ‘PowerAI’, a software toolkit for deep learning that runs on IBM’s recently-announced Power S822LC artificial intelligence servers – the highest performing server in IBM’s OpenPOWER LC line-up – which combine IBM’s Power architecture with Nvidia’s NVLink interconnect and GPUs.

It is a set of binary distributions including Caffe, Torch and Theano. Additional distributions include IBM and NVIDIA versions of Caffe (IBM-Caffe and NVCaffe) optimised for Power8 chips and NVLink. Caffe is a deep learning framework developed by Berkeley Vision and Learning Center.

The toolkit uses NVIDIA GPUDL libraries, including cuDNN, cuBLAS and NCCL as part of NVIDIA SDKs for multi-GPU acceleration on IBM servers

“PowerAI also provides a continued path for Watson, IBM’s cognitive solutions platform,” said IBM.

According to IBM:
The hardware-software solution provides >2x performance over “comparable servers” with four GPUs running AlexNet with Caffe.

  • Based on AlexNet Training for Top-1 50% Accuracy. IBM Power S822LC for HPC configuration: 16 cores (eight cores/socket) at 4.025GHz with 4xNvidia Pascal P100 GPUs; 512Gbyte memory; Ubuntu 16.04.1 running NVCaffe 0.14.5 compared to IBM Power S822L configuration: 20 cores (10 cores/socket) at 3.694GHz with 4xNvidia M40 GPUs; 512Gbyte memory; Ubuntu 16.04 running BVLC-Caffe f28f5ae2f2453f42b5824723efc326a04dd16d85. Software stack for both: G++ – 5.3.1, Gfortran –5.3.1, OpenBlas – 0.2.18, Boost –1.58.0, CUDA 8.0 Toolkit, Lapack –3.6.0, Hdf5 –1.8.16, Opencv –2.4.9.

The same four GPU Power-based configuration running Alexnet with BVLC Caffe can also outperform eight M40 GPU-based x86 configurations.

  • Based on IBM Power S822LC for HPC configuration: 20 cores (10 cores/socket) at 3.95GHz with 4xNvidia Pascal P100 GPUs; 512Gbyte memory; Ubuntu 16.04 LE running IBM version BVLC 1.0.0-rc3 compared to Intel E5-2640v4 (Broadwell): 20 cores (10 cores/socket) at 3.6GHz with 8xNvidia M40 GPUs; 512Gbyte memory; Ubuntu 16.04 LE running BVLC-Caffe 985493e9ce3e8b61e06c072a16478e6a74e3aa5a. Software stack for both: G++ – 5.4, Gfortran .4, OpenBlas – 0.2.19, Boost .58.0, CUDA 8.0 Toolkit, Lapack .6.0, Hdf5 .8.16, Opencv .4.9

PowerAI is available now, at no charge to customers of IBM’s Power S822LC, and is designed run on any number of servers from one to thousands.

Power S822LC is used in Juron the pilot system for the  at the European Commission’s Human Brain Project at the Juelich Supercomputing Centre. The project also operates a Cray computer called Julia.

Nvidia Tesla P100Power8, Nvidia and S822LC

For supercomputing, Nvidia created the Tesla P100 graphics processor, based on its Pascal architecture.

P100 is a SXM2 form factor chip-on-wafer module (see photo) capable of 21.2Tflop at 16bit precision, 10.6Tflop at single (32bit) precision, and 5.3Tflop at double (64bit) precision. In the module alongside the GPU is 16Gbyte of HBM2 memory linked at 732Gbyte/s.

There are four of these in a S822LC, in two pairs, each pair working with its own Power8 processor.

Power8 processors have 10 cores running at up to 3.26GHz in this applications, communicating with up to 500Gbyte (1Tbyte total for both CPUs) of local memory at 115Gbyte/s.

IBM Power8 serverNvidia considered PCIe Gen3 too slow for P100, so it created NVLink.

In the S822LC, each host CPU communicates with its pair of P100 GPUs over two 18Gbyte/s NVLinks (see diagram), which gives the GPUs fast access to large data sets held on the host’s local memory. A further 80Gbyte/s bus links GPUs in each pair.

All this said, Nvidia also makes a PCIe version of P100, available in Cray XC50 and CS-Storm supercomputers.


Leave a Reply

Your email address will not be published. Required fields are marked *

*