Can United States Control The Open Source RISC-V Architecture?

Can United States Control The Open Source RISC-V Architecture?

October 16th – RISC-V is no longer content with disrupting the CPU industry.

It’s waging a war on every processor integrated into SoCs or advanced packaging, in an ambitious plan that will pit it against well-funded research operators and their well-established ecosystems.

When Calista Redmond, CEO of RISC-V International, declared “RISC-V will be everywhere” at last year’s summit, most people probably thought she was referring to CPUs.

Clearly, the organization intends to drive RISC-V into servers and deep embedded devices. But their goals go far beyond that.

Can United States Control The Open Source RISC-V Architecture?

Can United States Control The Open Source RISC-V Architecture?

Redmond hinted that every processing core, GPU, GPGPU, AI processor, and all other types of processors can be based on RISC-V. Krste Asanovi?, a professor at the University of California, Berkeley, and the Chairman of RISC-V International, made this even clearer in his presentation.

With the recent completion of security and encryption standards work by RISC-V International, this vision is beginning to take shape. RISC-V International is forming teams and reviewing donations to enhance support for matrix multiplication, a fundamental function for GPUs and AI processors.

Behind these bold statements lies a fundamental shift in data and computing architecture. It’s no longer about which company has the fastest CPU because, no matter how well-designed it is, all CPUs have limitations.

“In some vertical markets, such as 5G/6G, inference, and video processing, their computational workloads have outgrown traditional CPUs,” said Russell Klein, Project Director at Siemens EDA Catapult HLS team. “This is where we see the adoption of new computing approaches.”

Almost every application has some form of control structure. “From the perspective of memory access, graphics are a very different beast with very specific requirements,” said Frank Schirrmeister, Vice President of Solution and Business Development at Arteris. “If you look at some recent releases of AI and RISC-V, you’ll find that there are ISAs underpinning the processing elements companies are releasing.”

In some cases, these need to be properly explained. “RISC-V has something called vector extension,” said Charlie Hauck, CEO of Bluespec. “Depending on how you implement this, you can get something that looks a lot like a GPU, especially for parallel or SIMD-type small units.”

However, this path is not without its challenges. “Adding GPU functionality to the RISC-V architecture through instruction extensions is attractive because GPUs play a crucial role in the field of artificial intelligence,” said Fujie Fan, R&D Director at Stream Computing. “However, we are also aware of the inevitable issues in architecture and ecosystem.”

Skeptics abound. The history of processors is riddled with many failed startups claiming they would beat competitors with a new computing architecture. Many didn’t consider that competition is not stagnant, and the computing environment is undergoing rapid changes, moving towards new methods and tools, along with the pain and cost of training/retraining engineers.

Dhanendra Jani, Vice President of Engineering at Quadric, said, “The value RISC-V brings to adopters is in control processing where it has readily available open-source tools, operating systems (Linux or real-time), and a promise of long-term software compatibility and portability through ISA generality.”

“Graphics processing is a very different challenge – a domain-specific processing challenge. To adapt the basic RISC-V instruction set into one suitable for GPU tasks requires substantial investment in defining custom ISA extensions, building highly complex microarchitectural changes, and performing ‘major surgery’ on open-source tools to make them almost identical to the original tools.”

So, using RISC-V for GPU tasks would dilute almost all of RISC-V’s inherent value advantages due to the extensive customization needed. You would lose most of the advantages while potentially being constrained by core ISA functions that limit their usefulness in a specific GPU context. In essence, what’s the point of starting from RISC-V instead of a clean slate?

So what is RISC-V’s plan? Mark Himelstein, Chief Technical Officer of RISC-V International, said, “Vector is SIMD computation, which allows you to perform computations on multiple data points simultaneously and enables the chip to figure out the best way to fetch something from memory, process a single instruction, and put things back into memory or move them to the next operation. The missing fundamental function is matrix multiplication.”

“We’ve had multiple proposals, and one of them is something like vector extensions in 32-bit instructions. This is very hard; it requires instruction setting. You set strides and masks and then pull the trigger to operate. But if you want to be competitive in larger matrix implementations on other architectures, you need wider 64-bit instructions. That’s what many are talking about.”

The question is how much complexity has been exposed and how much remains hidden. Anand Patel, Senior Director of Arm’s Client Business Line, said, “ISA is a critical component.” “However, the complexity of GPUs is typically abstracted by standard APIs like Vulkan or OpenCL. This makes it easier for developers to target multiple vendors while leaving lower-level optimizations to GPU vendors. Even in GPGPU-type applications, GPU architecture is evolving rapidly to keep up with emerging use cases such as AI processing. Developers have access to a mature software ecosystem to keep pace with these changes. Standard APIs ensure developers don’t have to worry about ISA changes but can transparently benefit from the advantages of these underlying improvements.”

RISC-V: Beyond the Control of Any Single Company or Nation

Macroarchitecture vs. Microarchitecture

It’s crucial to separate these two concerns because RISC-V only defines the macroarchitecture, leaving all microarchitecture decisions to the implementers. When moving beyond the CPU, this becomes a more significant issue. Himelstein of RISC-V says, “Von Neumann, in some respects, is limiting, but how a specific implementation interacts with memory is not decided by RISC-V.”

“Most GPU implementations optimize this through memory in a multi-stage pipeline. Some things come from memory, and some operations are underway. When you start looking at GPUs, you’re talking about exposed memory interactions. We do have some restrictions on the order in which things happen because you want to ensure well-defined operations.”

There are many ways to approach the issue. Russell Klein, Project Director at Siemens EDA Catapult HLS team, said, “The most advanced GPU products can be divided into traditional graphics processing and modern AI acceleration.” “The former is more like a programmable ASIC than a general-purpose processor, with its core capabilities coming from the implementation of stream processors, rather than the ISA. The instruction set is typically invisible to programmers, always taking a backseat. The design of graphics processors is closely related to microarchitecture, suitable for implementing customized instructions with clients. For most of us, standardization of multimedia capabilities is more attractive. To achieve such functionality, replicating the GPU is not the only way. For RISC-V, multimedia capabilities can be achieved through a vector architecture, and AI capabilities can be realized through a more efficient matrix accelerator heterogeneous architecture.”

When external programmers need to write software for your devices, some aspects change. “Dataflow processing can be done in several ways,” said Russell Klein. “One is to use small general-purpose processors, or even pipelined specialized processors, each handling a stage of a problem. This is clearly faster and more efficient than a single large CPU. Using programmable processors as computing elements preserves significant flexibility but does give up some performance and efficiency. This approach can actually be built by any capable multi

-core processor. The problem is that this approach has been thoroughly rejected by the software development community. They are unwilling to give up their single-threaded programming model.”

This is a big problem for many companies. Hauck of Bluespec said, “If you’re looking for a general-purpose processor that can be anything according to application requirements, it can be anything from a single-issue, two-issue, or three-issue microcontroller, all the way to a multi-core, multi-issue superscalar design running many cores. Or you’ll see designs with 4,096 RISC-V processors, each of which is a small, downsized RV32I type, and they are pulled together in a specific system architecture and interconnect to make these things work in the spirit of a GPU. They are made up of many smaller integer units working together to accomplish a massive task. The challenge is how to develop software for this?”

With greater flexibility, new approaches may be needed. Andy Meyer, Chief Product Marketing Manager at Siemens EDA, said, “In large HPC, if you run workloads more oriented toward data centers, it has a certain character. But if your application is scientific computing, there may be some features about loading and storage and the ability to extend to multiple math type operations.” “If people choose this path, the ecosystem will face some challenges. The major area of growth is large-scale applications. If you look at the amounts of venture capital, you will see they are clearly addressing a unique problem.”

Software and Ecosystem

For decades, hardware/software co-design has been a goal, and RISC-V is one of the few areas with progressive concepts. Russell Klein said, “Traditional data processing designs have strived to separate hardware and software.” “Hardware is created, and then software engineers lose control over it. If hardware is general enough, then software will be able to do anything it needs to provide the system’s functionality. If you have enough redundancy in computing power and power, this is feasible. I won’t say it’s very effective, but it is effective, even though it’s quite wasteful.”

Domain-specific computing is starting to change that. Russell Klein added, “To truly harness the potential of data flow processors, it means customizing for specific applications.” “This means that hardware and software teams need to work together to be successful. This makes many organizations and design teams very uncomfortable.”

Sometimes, co-design is the only way. Hauck of Bluespec said, “Suppose you need to do some processing at the edge.” “There will always be limits regarding shape, size, or power. No amount of software innovation will get you anywhere. If you have a software stack, that’s what it is. You won’t get a specific solution with those types of restrictions through software optimization. You have to start from the hardware.”

When creating embedded systems, the possibility of exposing the processor to a broad programming audience is smaller and can be more optimized. Himelstein said, “Consider the completed vector encryption work.” “No one is programming vector encryption in their programs. This is not what they do. They use libraries like libSSL or other encryption libraries, and they use those instructions. Sometimes they use them by going into assembly language, and then they provide a C, C++, or Java interface so software and applications can utilize them.”

It becomes more challenging when general programming is needed. Fujie Fan said, “If you look at the GPU ecosystem, the toolchain is controlled by NVIDIA.” “Other competitors, including AMD, have tried to break the monopoly but have failed. Standardizing the entire GPU product through RISC-V’s extended standard instruction set is almost impossible to make it compatible with the continuously updated NVIDIA ecosystem. On the other hand, starting over is also challenging because NVIDIA has a first-mover advantage.”

When Will It Succeed?

Despite this, RISC-V is all about innovation. Hauck said, “Many of the reasons we see why traditional solutions are currently the best solutions are historical problems.” “The real place where intelligent architects and intelligent software developers will shine is in RISC-V type environments.”

It starts with a common need. Andy Meyer, Chief Product Marketing Manager at Siemens EDA, said, “If there’s a need, people will come together and cooperate, and that’s what RISC-V is – cooperation.” “You can see one initiative after another happening all over the world. The ecosystem will continue to evolve, but there’s a balance between the business side and supporting the community.”

This may bring some business challenges, especially when the return on investment is low. Hauck said, “RISC-V will take some time to catch up and compete with existing products and ecosystems.” “But you will start to see that for some applications, if there are companies that support them correctly, there is no reason why RISC-V processors shouldn’t succeed. There are many excellent software developers. They will eventually succeed because the community already has all the tools needed for innovation.”

So, how long will it take to see RISC-V GPUs and AI processors? Himelstein said, “If you want reasonable AI functionality in a non-GPU world, you already have it today.” “However, the full complement of matrices and everything else that these groups have been asking for will probably appear in basic things in about a year and a half, and then probably in more advanced things in three to four years.”

Incremental approaches can expedite the use of these features. Fujie Fan said, “Rather than standardize the entire GPU product, it’s better to standardize each GPU function separately.” “As for AI capabilities, we believe the ongoing RISC-V matrix extension is a better choice for IC designers.”