Unlike the current generation Phi design, which operates as a coprocessor, Knights Landing incorporates x86 cores and can directly boot and run standard operating systems and application code without recompilation. The test system, which resembled a standard server board, with socketed CPU and memory modules was running a stock Linux distribution. This is made possible by the inclusion of a modified version of the Atom Silvermont x86 cores as part of each Knights Landing 'tile': the chip's basic design unit consisting of dual x86 and vector execution units alongside cache memory and intra-tile mesh communication circuitry. Each multi-chip package includes the a processor with 30 or more (rumored up to 36) tiles and eight high-speed memory chips. Although it wouldn't offer specifics, Intel said the on-package memory, totalling 16GB, is made by Micron with custom I/O circuitry and might be a variant of Micron's announced, but not yet shipping Hybrid Memory Cube, but this wasn't confirmed. The high-speed memory is conceptually similar to the DDR5 devices used on GPUs like
Intel is long on device-level technology details, but short on specifics regarding the types of business problems, applications and users Phi targets. So far, Phi has been used by HPC applications like scientific simulations, modeling and design, with product announcements often happening during supercomputing conferences. Yet as NVIDIA demonstrated at GTC,, the opportunities for compute hardware supporting wide, vector operations built for highly parallelized algorithms, whether general purpose GPUs or now Xeon Phi, extends far beyond the supercomputer niche. As I previously discussed, NVIDIA has morphed its GPU architecture into a powerful engine for calculating deep learning neural network algorithms that is now being applied to accelerate SQL database operations and analytics.
The internals of a GPU and Xeon Phi are much different, however both share many common traits: dozens (Phi) to thousands (GPU) of lower performance (Phi) to relatively simple cores (GPU), vector processing units and very high-speed local memory and buses. The creative exploitation of this capability isn't limited to scientific or graphics calculations and seems limited only by the imagination of developers. But on this front, Intel didn't have encouraging news. Although Saleh mentioned deep learning as a potential application, he couldn't point to specific customers or researchers using Phi for anything but traditional scientific HPC problems.
Unlike NVIDIA, which announced the price and availability of a Titan X development box designed for researchers exploring GPU applications to deep learning, Intel wouldn't share details about when OEM partners will have Knights Landing systems, whether any ould be sized and priced for individual developers, nor a timeline for what Saleh characterized as a Phi developer program, which will include access to hardware, updated software tools and training materials. But developer resources, outreach and evangelism are critical to any success Phi is to have, particularly since the hardware itself represents a substantial departure from the standard x86 microarchitecture, and here NVIDIA has a big head start.
Yet thinking of Phi as an alternative GPU or vector processor is wrong since it's actually a hybrid that includes dozens of full-fledged 64-bit x86 cores. This unique design, if used right, could significantly accelerate certain parallelizable application categories besides HPC simulation like deep learning and data analytics that use vector calculations, while offering drop-in computational offload for standard x86 code.
One intriguing possibility is using Phi as an app container processor for Docker or Rocket microservices, where each core has an Atom processor, 512-bit vector unit, a chunk (100MB or more) of on-package high-speed memory and access to 4-8GB of system RAM. Using the test hardware Intel demonstrated, that's 240 cores, or 960 application threads per 2U server. For the right workloads, Phi even makes sense as a virtualization platform hosting dozens, if not hundreds, of guest operating systems.
Technically, Intel has achieved a tour de force with its Xeon Phi overhaul, however like any technology that breaks existing paradigms, it will take time for both Intel and developers to fully understand its optimal use and opportune applications. Hopefully the chip and package aren't so difficult to produce that it's prohibitively expensive for anyone but government-funded research labs because the sooner Intel can get Knights Landing into the hands of ordinary developers with a clever idea, the sooner we'll all better understand its potential and witness a new crop of groundbreaking applications.