![]() Perhaps the massive pile of money now on the table with ChatGPT and the new CK playground and its competitors will enable more companies to be more transparent. I am super excited about this development.įinally, I can’t wait to see the VLLM benchmark competition when that arrives in six months. So if this strategy works out, and as the backends for new chips like TPU, AWS Inferentia, and AMD are completed, we should see an explosion in valid comparisons that will help all users and buyers of AI technology. Grigori Fursin, the Collective Knowledge Founder, said “The open-source CK technology has helped to automate, unify and reproduce more than 80% of all submission results including 98% of power results with very diverse technology and benchmark implementations from Neural Magic, Qualcomm, Krai, cKnowledge, cTuning, DELL, HPE, Lenovo, Hugging Face, Nvidia and Apple across diverse CPUs, GPUs and DSPs with PyTorch, ONNX, QAIC, TF/TFLite, TVM and TensorRT using popular cloud providers (GCP, AWS, Azure) and individual servers and edge devices provided by the CK users and contributors.” So, how to solve this apparent vendor intransigence? One possibility is to engage the community to run and submit benchmarks, models, and learnings through a new MLCommons Collective Knowledge challenge to run, reproduce and optimize MLPerf inference v3.0 benchmarks led by cTuning foundation and cKnowledge Ltd. ![]() That worked well then, but in the modern world, not so much. When I was at IBM Power servers, we had an unwritten rule: don’t publish any benchmark you didn’t win. Its very hard to imagine these companies aren’t interested in showing how well they run the latest models on their latest hardware. Notably, Amazon AWS, AMD data center GPUs like the Instinct MI250, Google TPU (which had published great results in the past), Tesla and Intel Gaudi were all no-shows, as were startups Cerebras, Graphcore, Groq and Samba Nova. Its going to be very difficult for anyone to catch them, especially in the data center and in edge applications requiring flexibility of running many models. NVIDIA has far more software engineers than many of their competitors have employees, and these engineers continue to squeeze more performance out of each generation of chips. ![]() Once again, NVIDIA wins in performance, but faces increasing competition for power-constrained environments such as the Edge Data Center and the Embedded Edge, where Qualcomm and SiMa.ai had winning results. And remember the performance gains I mentioned above for Qualcomm? I’d expect SiMa to continually increase their performance with each release of their TVM back end software. In the MLPerf 3.0 round, the company bested NVIDIA Jetson power efficiency for image classification by 47%, not bad for a first round. So SiMa.ai built the MLSoC chip from the ground up as an embedded platform. Scaling down a data center AI chip just won’t cut it. ![]() To reach the power customers demand, one has to design an edge chip from the ground up. Some examples of AI at the embedded edge include voice assistants that can understand and respond to user commands, sensors that can detect anomalies and trigger alerts, and autonomous vehicles that can recognize and respond to their surroundings in real-time. SiMa.ai shared this with us to show the broad range of image processing they handle at 15 watts. The Open Division of MLPerf allow for all types of tricks and changes, so long as the accuracy is reached. The 3.0 results include new benchmarks in the open division, including a block pruned BERT Large over a network that delivered 100% accuracy and 2.8x better performance than its closed division submission. As a testament to the importance of software optimizations over time, Qualcomm has demonstrated a 75% performance and a 52% power efficiency improvement since they began this journey 3 years ago. The company’s Cloud AI100 was submitted for over 25 server platforms with 320 results, all of which are the best in the industry in terms of power efficiency, latency, and throughput in its class. ![]() “We’ve never come across Qualcomm at any of our prospective companies,” said Gopal Hegde, VP of Products at SiMa.ai. Note that these two company’s don’t really compete, in spite of the common “Edge” terminology. Power matters for inference at the edge, both the edge data center (Qualcomm) and in the embedded edge market, in this case SiMa.ai. While NVIDIA GPUs delivers industry leading performance, that performance lead comes at a cost besides purchase cost: power consumption. The amazing part of this story is that the optimization process is automatic! And they have the customer list to prove that it actually works, as well as hardware partners including NVIDIA, Intel, AWS, and many system vendors such as HPE. Deci delivered fantastic model optimizations for NVIDIA A100 H100 GPUs and can be applied to. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |