Graphcore Limited is a British semiconductor company that develops accelerators for AI and machine learning. It has introduced a massively parallel Intelligence Processing Unit (IPU) that holds the complete machine learning model inside the processor.[3]
History
Graphcore was founded in 2016 by Simon Knowles and Nigel Toon.[4]
In July 2017, Graphcore secured a round B funding led by Atomico,[7] which was followed a few months later by $50 million in funding from Sequoia Capital.[8]
In December 2018, Graphcore closed its series D with $200 million raised at a $1.7 billion valuation, making the company a unicorn. Investors included Microsoft, Samsung and Dell Technologies.[9]
On 13 November 2019, Graphcore announced that their Graphcore C2 IPUs were available for preview on Microsoft Azure.[10]
Meta Platforms acquired the AI networking technology team from Graphcore in early 2023.[11]
In 2016, Graphcore announced the world's first graph tool chain designed for machine intelligence called Poplar Software Stack.[14][15][16]
In July 2017, Graphcore announced its first chip, called the Colossus GC2, a "16 nm massively parallel, mixed-precision floating point processor", that became available in 2018.[17][18] Packaged with two chips on a single PCI Express card, called the Graphcore C2 IPU (an Intelligence Processing Unit), it is stated to perform the same role as a GPU in conjunction with standard machine learning frameworks such as TensorFlow.[17] The device relies on scratchpad memory for its performance rather than traditional cache hierarchies.[19]
In July 2020, Graphcore presented its second generation processor called GC200, built with TSMC's 7nm FinFET manufacturing process. GC200 is a 59 billion transistor, 823 square-millimeter integrated circuit with 1,472 computational cores and 900 Mbyte of local memory.[20] In 2022, Graphcore and TSMC presented the Bow IPU, a 3D package of a GC200 die bonded face to face to a power-delivery die that allows for higher clock rate at lower core voltage.[21] Graphcore aims at a Good machine, named after I.J. Good, enabling AI models with more parameters than the human brain has synapses.[21]
Both the older and newer chips can use 6 threads per tile[clarification needed] (for a total of 7,296 and 8,832 threads, respectively) "MIMD (Multiple Instruction, Multiple Data) parallelism and has distributed, local memory as its only form of memory on the device" (except for registers).[citation needed] The older GC2 chip has 256 KiB per tile while the newer GC200 chip has about 630 KiB per tile that are arranged into islands (4 tiles per island),[25] that are arranged into columns, and latency is best within tile.[clarification needed][citation needed] The IPU uses IEEE FP16, with stochastic rounding, and also single-precision FP32, at lower performance.[26] Code and data executed locally must fit in a tile, but with message-passing, all on-chip or off-chip memory can be used, and software for AI makes it transparently possible,[clarification needed] e.g. has PyTorch support.[citation needed]