The current situation of polarization of AI computing power, how ordinary developers can break the situation in 2026

📅 2026-05-18 11:34:28 👤 DouWen Editorial 💬 6 条评论 👁 4

A public fact in the AI ​​industry in 2026 is that top computing power is highly concentrated in the hands of a few leading companies. Ordinary developers face the multiple dilemmas of being unable to afford large clusters, uncontrollable API prices, and limited local small model capabilities. This article does not cite the specific GPU numbers that change frequently and have different calibers. Instead, it explains from a structural level the logic behind the polarization of AI computing power, as well as several paths that ordinary developers can still take to break the situation in 2026.

The realistic picture of computing power differentiation

配图

It is generally accepted in the industry that the gap between the number of high-end GPUs available at a leading AI company and an ordinary research laboratory, a ten-person startup, or an independent developer is several orders of magnitude, not simply several times. This gap itself is not a dispute. The specific number of cards of each company is a commercial secret. Public figures are often estimates and media speculations. I will not quote them here. I will only admit one fact - it is impossible to directly compare the computing power between you and OpenAI, Anthropic, and Google.

In terms of price, the price of a single card of H100-level GPU on the open market is in the order of tens of thousands of dollars, and the price of an eight-card machine is even higher, which fluctuates with channels and time. In the rental market, RunPod, Lambda Labs, Vast.ai, CoreWeave and other platforms can charge by the hour. The specific price is based on the current page of the platform. A common sense conclusion is that the GPU bill for running a medium-sized experiment for a month can easily reach tens of thousands of dollars, which is unbearable for individuals and puts great pressure on small companies.

At the acquisition level, the new generation of high-end GPUs have been in short supply for a long time. Large customers are given priority, small customers are queued up, and H100 instances on public clouds are often out of stock. This supply-side structure itself amplifies the computing power gap.

Why does this differentiation occur?

配图

The first reason is the power law of scale. The computing power required to train the flagship large model increases non-linearly with the amount of parameters and data, and players with small computing power cannot enter this game. The second reason is supply chain priority. Nvidia shipments tend to be tied to large customers, and it is easier for leading companies to get new cards. The third reason is revenue feedback. Leading companies have considerable API and subscription revenue and can continue to invest in card purchases; small companies do not have this closed loop of cash flow.

Coupled with export controls, energy restrictions, CUDA ecological lock-in, and the high valuation of leading companies in the capital market, these factors combine to raise the computing power threshold to a height that ordinary developers cannot cross. This is a structural problem, not a temporary imbalance. Any idea of ​​"catching up in another two years" is unrealistic.

Accept reality and choose a track that suits you

配图

The first thing ordinary developers have to do is not to fight head-on with the leading companies, but to accept reality and reposition themselves.

What not to do: Train a basic large model from scratch that reaches the mainstream closed-source level. The investment in this path exceeds the upper limit of the capabilities of most teams. Don't try to start a price war with giants on general API prices. They have scale dilution, but you don't.

There are several types of things that can be done: application layer innovation based on the head model, which is the most valuable space for ordinary developers; vertical industry fine-tuning, turning general models into experts in specific industries; model optimization and compression, allowing large models to run on small devices, reducing delays and costs; Agent workflow and system architecture, where engineering capabilities are more important than computing power.

Choosing a track where you have a comparative advantage is the first step to break the situation.

The first way to break the situation is to use API without self-training or self-management.

配图

For most developers, directly calling the cloud API is the most cost-effective solution.

The cost difference between self-built training and direct use of the API is usually more than two to three orders of magnitude. The specific number will vary with the model and workload. We will not make a precise quotation here, but only point out the difference in magnitude. The flagship APIs available overseas are OpenAI, Anthropic, and Google. The available domestic APIs are DeepSeek, Moonshot, Zhipu, Alibaba Tongyi, Byte Doubao, etc., and most of them provide OpenAI compatible protocols.

Scenarios suitable for pure APIs include conversations, Q&A, document summarization, code generation, data analysis, customer service, marketing copywriting, and almost any application that does not have special data compliance requirements. Three best practices: turn on prompt caching and let the parts that hit the cache be billed at a low price; do multi-model routing, use small models for simple queries, and use the flagship for complex tasks; try to use the batch API for non-real-time tasks, which can save a considerable proportion of costs.

We also need to be clear about the shortcomings of APIs - sensitive data may leave the company, manufacturers may adjust prices, and models may go offline. The suggestion is to use the API to run the business first, and then consider whether to build it yourself. The API is enough for most use cases.

Breakthrough path two, open source small model local inference

配图

If you are worried about data compliance, or want more control over the model link, local inference is a mature path.

The open source ecosystem is already very rich in 2026. Meta's Llama series, Alibaba's Qwen series, DeepSeek, Mistral, Google's Gemma, etc. are all available in multiple sizes. The specific version is most suitable and the parameter level is subject to the current release of the respective official warehouse. The hardware requirements for local inference are roughly that consumer-grade graphics cards can run small-sized models, medium-sized graphics memory can run medium-sized quantized models, and data center-level GPUs are suitable for running large-sized full-precision models.

In terms of tool chain, Ollama is a convenient tool for individuals to start locally, LM Studio provides a GUI, vLLM and SGLang are production-level inference engines, which are significantly better than naive implementation in terms of throughput and concurrency.

Suitable scenarios: local experiments, privacy-sensitive conversations, internal corporate knowledge bases, offline scenarios. The disadvantage is that small and medium-sized open source models still lag behind the leading closed-source flagships in complex reasoning. The exact difference is not stable in the rankings of each company. Without quoting, it can only be said that the gap is not big in daily completion and summary tasks, but the gap is obvious in complex agent chains.

The third way to break the situation is to rent a GPU for limited training.

配图

If the business must do fine-tuning or small-scale training, renting a GPU is a compromise.

Mainstream platforms include RunPod, Lambda Labs, Vast.ai, CoreWeave, domestic AutoDL, etc. The price fluctuates with the graphics card model, contract length, and market supply and demand, and is subject to the current price of the platform. The critical point between renting and buying depends on your actual usage time per month. It is more cost-effective to buy a card for long-term high usage, and renting a card for short-term experiments is more flexible.

Through efficient fine-tuning of parameters such as LoRA or QLoRA, a few cards can be run in a few days to produce usable results in a vertical field, which is completely affordable for ordinary teams. But to pre-train a slightly larger model from scratch, the GPU time required far exceeds the budget of individuals and small teams, so don’t go this route.

For the tool stack, use HuggingFace Transformers plus PEFT, plus DeepSpeed ​​or FSDP distributed, and for the framework layer, choose packaged scaffolding such as Axolotl and LLaMA Factory. It is recommended to run through the process on one card first and then expand to multiple cards. Otherwise, debugging on multiple cards will cost more than training.

Breakthrough path four, specializing in Agent and Workflow

The judgment worth repeating in 2026 is that the value of Agent engineering is rising rapidly. The reason is that model capabilities are sufficient to support many tasks, and the real bottleneck lies in how to orchestrate multi-step reasoning, call tools, handle errors, maintain long-term memory, and collaborate with multiple agents.

Mainstream frameworks include LangChain, LlamaIndex, LangGraph, CrewAI, AutoGen, etc. Each has its own focus. Which one you choose depends on the complexity of your workflow. Frequently discussed products such as Cursor, Claude Code, and Devin are essentially examples of Agent engineering. Their difference is not the model itself, but the upper-layer orchestration and engineering details.

Business value: The value created by an Agent system that can solve specific business problems is much higher than training a general model from scratch. Customer service automation, contract analysis, code review, data cleaning and other scenarios are all high-demand directions in 2026.

In terms of skill investment, if you master one or two Agent frameworks in depth, adjust the RAG system well, be familiar with at least one vector database, and handle the boundary conditions of tool calls stably, you can basically enter this track. This path requires low computing power, code engineering capabilities are the core, and ordinary developers can fully compete.

Breakthrough path five, differentiation in vertical fields

General LLM leading companies are doing well, but vertical fields are opportunities for ordinary developers.

In fields such as medical care, law, finance, education, industry, and government affairs, the performance of general models is often not good enough. The reason is not the computing power, but the lack of professional data, domain context, and compliance understanding. In these fields, what can really create value is not a stronger general model, but a team that is familiar with the industry, can obtain compliance data, and can find specific pain points.

The advantage of ordinary developers here is that they can be close to the front lines of the industry, can accumulate clean domain data, understand the customer context, and find specific customers who are willing to pay. The startup threshold is not high. With a vertical Agent and a small model fine-tuned for this field, a few people can make a usable MVP in a few months.

Breaking Path Six, Optimization and Compression Project

The model has been trained, but making it run cheaper and faster is an independent project. Quantification, pruning, distillation, KV Cache optimization, Flash Attention, Continuous batching, each of these directions has a lot of engineering space and talent gaps.

All companies that use LLM need inference engineers to reduce costs. Ordinary developers start with consumer-grade GPUs and learn about topics such as vLLM, quantification algorithms, and attention optimization. They can develop capabilities in a few months and directly meet enterprise-level needs. This is a path that requires the least computing power but has high returns.

Long-term strategies for programmers under the computing power gap

Three principles. First, don’t compete head-on with the basic model. This is not a track that an individual can win. Second, work in areas where the head model is not good at. Vertical, application, engineering, and agent are the home fields of ordinary developers. Third, maintain an understanding of the underlying technology. Even if you don’t train yourself, you must understand transformer, RAG, fine-tune, and quantification. These abilities will make you smarter than others when using APIs.

A reasonable rhythm in time allocation is to work on projects most of the time, fix a small amount of time to update the latest papers and tools, and do a small experiment in a new direction every quarter. Don't worry, the computing power gap is structural. Ordinary developers don't need to make up for this gap. What they need to make up for is judgment and engineering ability.

FAQ

Can I still learn AI if I don’t have H100?

Absolutely. A consumer-grade graphics card or a cloud GPU hours account is enough for you to learn most engineering practices. Running small-size models locally, understanding the internals of transformers, calling APIs for applications, studying RAG systems, and doing small-scale LoRA fine-tuning do not require H100. H100 is designed for training large basic models, which are only made by leading companies.

Can ordinary companies still make basic models?

Some people are still making small-sized basic models, but the commercial value is not high because there are already a lot of options in the open source ecosystem. The basic model market is basically saturated. Llama, Qwen, Mistral, etc. are all open source, so there is no need to redo them as they are. The real business value lies in the vertical fine-tuning and application layer.

Are domestic GPU alternatives such as Huawei Ascend worth using?

It deserves attention, but the ecology is still catching up. Ascend is competitive in terms of hardware performance, but the CUDA ecosystem is not compatible. Using Ascend requires rewriting some code and kernel. Domestic independent controllable scenarios are worth using, and Nvidia is still the main choice overseas. In the medium to long term, the tool chains of Shengteng and other domestic accelerator cards will continue to improve.

Will the computing power gap turn AI into an oligopoly?

In the short term, the head concentration is very high, but in the long term it may not be. Three opposing forces are at work: the open source model continues to approach closed source capabilities, the vertical application layer is not easily swallowed up by the head, and the cost of unit computing power is still declining with hardware iteration. Most industries will not have an extreme oligopoly with only two or three suppliers.

Should I switch to AI or continue traditional development?

You don’t need to change careers completely, you can do it gradually. First use AI tools in existing development work to increase efficiency. Then learn RAG and Agent engineering in my spare time and make a side project. After a few months you will have a better idea of ​​whether you really want to work on AI applications full-time. Changing careers all-in with no experience at all is a high risk.

Source of inspiration: Issue 391 of Ruan Yifeng's "Technology Enthusiasts Weekly" https://www.ruanyifeng.com/blog/2025/09/weekly-issue-391.html

📝 本文来自抖文 www.douwen.me ,转载请保留出处。

💬 评论 (6)

S
SEOFan 2026-05-17 15:59 回复

Sharing this with my team.

D
DevTools 2026-05-18 08:38 回复

Easy to follow.

C
ContentDev 2026-05-18 07:47 回复

Thanks for the detailed comparison.

A
AIWatcher 2026-05-18 01:56 回复

Loved the FAQ section.

R
ResearcherJ 2026-05-17 19:07 回复

Clear and to the point.

D
DevTools 2026-05-17 23:10 回复

Best summary I've read on this.