Nvidia DGX-2 announces as the world's largest GPU

walden, system, systems, walden systems, accordion, backup, back up,back, up, ransom, ware, ransomware, data, recovery, critical, protection, remote, virtual, cloud, computing, desktop, ciel, cielview, view, vm, machine, vdi, infrastructure, server, paas, saas, platform, service, software, serverless, thin, client, workspace, private, public, iaas, cloud, terminal, ssh, developer, java, objective, c, c++, c#, plus, sharp, php, Excel, sql, windows, os, operating, system, o.s., powershell, power, shell, javascript, js, python, py, networks, faq, perl, pl, programming, script, scripting, program, programer, code, coding, example, devel, stored, procedure, sp, macro, switch, hub, router, ios, net, .net, interpreter socket, nas, network storage, virus, security



     NVIDIA announced the NVIDIA DGX-2 supercomputer consisting of 16 Volta GPUs stacked together using NVSwitch. The system power consumption is a fraction of what you would normally get using CPU based clusters and comes at a premium at $399,000. The use of NVSwitch make it much faster than a server rack with 16 Volta GPUs in a vanilla setup. The server has a total of 81,920 CUDA Cores with 512 GB HBM2 memory and a 14.4 TB/s aggregate bandwidth and 300 GB/s GPU to GPU. The total power consumption of the rack is 10,000 watts and weighs 350 pounds. he DGX-2 has 10X the processing power of DGX-1 unveiled in September 2017

     With the explosion of AI, the number of layers, the training rates sweeping through different fraemworks, with bigger networks, more experimentation, DGX-2 couldn't come at a better time. Developers now have the deep learning training power to tackle the largest datasets and most complex deep learning models. Combined with a fully optimized, updated suite of NVIDIA deep learning software, DGX-2 is purpose-built for data scientists pushing the limits of deep learning research and computing. DGX-2 can train FAIRSeq, a state of the art neural machine translation model, in less than two days, a 10x improvement in performance from the DGX-1 introduced in September.




     DGX-2 is the latest addition to the NVIDIA DGX product portfolio, which consists of three systems designed to help data scientists quickly develop, test, deploy and scale new deep learning models and innovations. The DGX-2 is the top of the lineup. It joins the NVIDIA DGX-1 system, which features eight Tesla V100 GPUs, and DGX Station, the world's first personal deep learning supercomputer, with four Tesla V100 GPUs in a desktop design. These systems enable data scientists to scale their work from the complex experiments they run at their desks to the largest deep learning problems, allowing them to do their work.

     What makes the DGX-2 possible is the development of NVSwitch, a new interconnect architecture that allows NVIDIA to scale its AI integrations further. The physical switch is built on 12nm process technology from TSMC and has about 2 billion transistors all on its own. It offers 2.4 TB/s of bandwidth. As PCI Express became a bottleneck for multi-GPU systems that are crunching on enormous data sets typical of deep learning applications, NVIDIA worked on NVLink. First released with the Pascal GPU design and used with Volta as well, the V100 chip has support for 6 NVLink connections and a total of 300 GB/s of bandwidth for cross-GPU communication. NVSwitch builds on NVLink as an on-node design and allows for any two pairs of GPUs to communicate at full NVLink speed. This allows the next level of scaling, moving behind the number of NVLink connections on a per GPU basis and allows for a network to be built around the interface. The switch has 18 links and is capable of eight 25 Gbps bi-directional connections. Though the DGX-2 is using twelve NVSwitch chips for connecting 16 GPUs, NVIDIA states that there is no technological reason they couldn't push beyond that. There is simply a question of need and physical capability.



     NVIDIA has its own need for high performance AI compute capability, and the need to simplify and compress that capability to save money on server infrastructure is substantial. NVIDIA is one of the leading developers of artificial intelligence for autonomous driving, robotics training, algorithm and container set optimization, etc. But other clients are buying in, organizations like New York University, Massachusetts General Hospital, and UC Berkeley have been using the first-generation device in flagship, leadership development roles. The expected sales targets are the small groups on the bleeding edge of AI development. A $400k AI accelerator may not have a direct effect on many of NVIDIA's customers, but it solidifies the company's position of leadership and internal drive to maintain it. With added pressure from Intel, which is pushing hard into the AI and machine learning fields with acquisitions and internal development, NVIDIA needs to continue down its path and progression.