The chip industry is moving towards computing in specific fields, while artificial intelligence (AI) is moving in the opposite direction, and this gap may force significant changes in future chip and system architectures.
Related product recommendations:
Behind this split lies the time required to design hardware and software. In the 18 months since ChatGPT was launched globally, a large number of software startups have been exploring new architectures and technologies. Considering the rapid pace of task changes mapped onto them, this trend may continue. But producing a customized chip usually takes over 18 months.
In a standard world, software does not undergo significant changes over time, and customizing hardware to meet the precise needs of applications or workloads is worth it, that's all. This is one of the main driving factors behind RISC-V, where the processor ISA can be specifically designed for a given task. However, with various changes in AI, hardware may have become outdated when put into mass production. Therefore, unless the specifications are constantly updated, hardware optimized specifically for applications is unlikely to enter the market quickly enough for use.
Therefore, the risk of AI chips failing to run for the first time in specific fields will increase. While fixing this issue, generative AI will continue to evolve.
But this does not mean the end of customizing silicon wafers. Data centers are deploying an increasing number of processing architectures, each of which outperforms a single general-purpose CPU in specific tasks. Steve Roddy, Chief Marketing Officer of Quadric, said, "With the surge in AI workloads in data centers and the forced adaptation of data center chips and systems to rapidly evolving situations, even the last bastion of ordinary computing power has collapsed."
But it does point out the architecture that balances ultra high speed, low-power silicon wafers with more general-purpose or small chips.
"In the field of artificial intelligence, there is a strong demand to make things as universal and programmable as possible, because no one knows when the next LLM thing will appear and completely change the way they do things," said Elad Alon, CEO of Blue Cheetah. "The more you get into trouble, the more likely you are to miss the trend. At the same time, it is evident that it is almost impossible to meet the computing power required to use a fully universal system, and therefore it is also almost impossible to meet power and energy requirements. There is a strong demand for customized hardware to make it more efficient on specific things known today."
The challenge lies in how to efficiently map software onto such heterogeneous processor arrays, which the industry has not yet fully mastered. The more coexisting processor architectures there are, the more difficult the mapping problem becomes. "In modern chips, there is a GPU, a neural processing unit, and a core processing unit," said Frank Schirrmeister, Vice President of Solutions and Business Development at Arteris, who currently serves as the Executive Director of Synopsys' Strategic Projects and System Solutions, in an interview. "You have at least three calculation options, and you must decide where to place things and set appropriate abstraction layers. We used to call it software collaborative design. When you port algorithms or parts of algorithms to NPUs or GPUs, you will readjust the software to transfer more software execution to more efficient implementations. There is still a common component in the calculation that supports different elements."
Chasing the Leader
The emergence of AI is due to the processing power of GPUs, and the functions required for graphics processing are very similar to those required for the core components of AI. In addition, a software toolchain has been created to map non graphical features to the architecture, making NVIDIA GPUs the easiest to locate processors.
"When someone becomes a market leader, they may be the only competitors in the market, and everyone will try to respond to them," said Chris Mueth, New Opportunities Business Manager at Keysight. But this does not mean it is the optimal architecture. We may not be aware of this yet. GPUs are suitable for certain applications, such as performing repetitive mathematical operations, which are difficult to surpass. If you optimize the software to work with GPUs, the speed will be very fast
Becoming a leader in general accelerators may bring resistance. Russell Klein, Senior Integrated Project Director at Siemens EDA, said, "If you want to build a universal accelerator, you need to consider future oriented issues. When NVIDIA sits down to build a TPU, they must ensure that the TPU can meet the widest possible market, which means anyone who envisions a new neural network needs to be able to put it into this accelerator and run it. If you want to build something for a certain application, there is almost no need to consider future oriented issues. I may want to build a bit of flexibility so that I have the ability to solve problems. But if it's just fixed as a specific implementation that can perform a job very well, then in 18 months, someone will be there." Come up with a brand new algorithm. The good news is that I will lead everyone else by using my custom implementation until they can catch up with their own custom implementation. "What we can do with existing hardware is very limited."
But specificity can also be layered. "A part of IP delivery is the hardware abstraction layer, which is publicly available to software in a standardized manner," Schirrmeister said. If there is no middleware, the graphics core is useless. Application specificity moves up in abstraction. If you look at CUDA, the computing power of NVIDIA core itself is quite universal. CUDA is an abstraction layer, on which there are libraries for various biological things. This is great because application specificity has risen to a higher level
These abstract layers were very important in the past. Sharad Chole, Chief Scientist and Co Founder of Expedera, said, "Arm integrates a software ecosystem on top of application processors. From then on, heterogeneous computing allows everyone to build their own additional components on that software stack. For example, Qualcomm's stack is completely independent of Apple's stack. If you extend it, there will be an interface available for better performance or better power distribution. Then there will be space for coprocessors. These coprocessors will allow you to do more differentiation, not just building with heterogeneous computing, because you can add or remove it, or you can build an updated coprocessor without starting a new application process, which is much more expensive."
Economic factors are an important factor. "The popularity of fully programmable devices that accept C++or other high-level languages, as well as feature specific GPUs, GPNPUs, and DSPs, has reduced the need for dedicated, fixed function, and financially risky hardware acceleration modules in new designs," said Roddy of Quadric.
This is both a technical issue and a business issue. Blue Cheetah's Alon said, "Some people may say that I want to do this very specific target application, and in this case, I know I will do the following things in the AI or other stack, and then you just need to make them work." "If the market is large enough, then for a company, this may be an interesting choice. But for AI accelerators or AI chip startups, this is a more tricky bet. If there is not enough market to justify the entire investment, then you must predict the ability required for markets that do not yet exist. This is actually what kind of business model and bet you are taking." "What technological strategies can be adopted to optimize it as much as possible?"
The situation of dedicated hardware
Hardware implementation requires selection. "If we can standardize neural networks and say that's all we need to do, then you still have to consider the number of parameters, the number of necessary operations, and the required delay," said Chole from Expedera. "But this has never been the case, especially for AI. From the beginning, we started with 224 x 224 stamp images, then moved on to high-definition, and now we need to move on to 4k. The same goes for LLM.". We started with 300 megabit models (such as Bert), and now we are moving towards parameters of billions, billions, or even trillions. Initially, we only started with language translation models (such as token prediction models). Now we have a multimodal model that can support language, visual, and audio simultaneously. The workload is constantly evolving, and this is what is happening in the chase game.
There are many aspects of the existing architecture that are worth questioning. "The key to designing a good system is to identify significant bottlenecks in system performance and find ways to accelerate them," said Dave Fick, CEO and co-founder of Mythic. "Artificial intelligence is an exciting and far-reaching technology. However, it requires performance levels of trillions of operations per second and memory bandwidth that standard caching and DRAM architectures cannot support. This combination of practicality and challenge makes artificial intelligence the preferred choice for specialized hardware units."
The insufficient number of universal devices to meet demand may be a factor forcing the industry to adopt more efficient hardware solutions. "The progress in the field of generative artificial intelligence is very fast," Chole said. "At present, there is nothing that can meet hardware requirements in terms of cost and power. There is nothing. Even the shipment volume of GPUs is not enough. There are orders, but the shipment volume is not enough. This is a problem that everyone sees. There is not enough computing power to truly support the workload of generative artificial intelligence."
Small chips may help alleviate this problem. "The upcoming small chip tsunami will accelerate this transformation of data centers," Roddy said. "As small chip packaging replaces single-chip integrated circuits, the ability to mix and match fully programmable CPUs, GPUs, GPNPUs (General Programmable NPUs), and other processing engines to complete specific tasks will first affect data centers, and then inevitably decrease with increasing production of small chip packaging, slowly radiating to larger batches and more cost sensitive markets."
Multiple markets, multiple trade-offs
Although most attention is focused on training new models in large data centers, the ultimate benefit will belong to the devices that use these models for inference. These devices cannot afford the huge electricity budget for training. "The hardware used to train artificial intelligence is a bit standardized," said Marc Swinnen, Director of Product Marketing at Ansys. "You purchase NVIDIA chips, which is how you train artificial intelligence. But once you establish the model, how do you execute it in the final application (perhaps at the edge)? This is usually a chip tailored to a specific implementation of the AI algorithm. The only way to obtain a high-speed, low-power AI model is to build a custom chip for it. Artificial intelligence will become a huge driving force for the custom hardware that executes these models."
They need to make a series of similar decisions. "Not every AI accelerator is the same," said Fick from Mythic. "There are many great ideas on how to address the memory and performance challenges brought by AI. In particular, there are new data types that can go all the way up to 4-bit floating-point or even 1-bit accuracy. Analog computing can be used to achieve extremely high memory bandwidth, thereby improving performance and energy efficiency. Others are considering streamlining neural networks to the most critical bits to save memory and computation. All of these technologies will result in hardware that is powerful in some fields but weak in others. This means greater hardware and software co optimization, as well as the need to establish an ecosystem with various AI processing options."
This is precisely where the interests of AI and RISC-V intersect. Sigasi CEO Dieter Therssen said, "In terms of software tasks such as LLM, they will dominate and be sufficient to drive new hardware architectures, but differentiation will not completely stop, at least in the short term." "Even the customization of RISC-V is based on the need for some CNN or LLM processing. A key factor here is how to deploy AI. Currently, there are too many methods to achieve this, so imaging fusion is still out of reach."
conclusion
AI is an emerging thing that is developing so rapidly that no one can provide a clear answer. What is the best architecture for existing applications? Will future applications look similar enough that existing architectures only need to be extended? This may seem like a very naive prediction, but today it may be the best choice for many companies.
The GPU and the software abstraction built on it have made the rapid rise of AI possible. It provides sufficient framework for the extensions we see, but this does not mean it is the most efficient platform. Model development is forced to some extent towards the direction supported by existing hardware, but with the emergence of more architectures, AI and model development may diverge based on available hardware resources and their demand for electricity. Electricity is likely to become the dominant factor in both, as current predictions suggest that AI will soon consume a significant portion of global power generation capacity. This situation cannot continue.