我应该一开始就用一家还是混合用三家

建议先选一家把业务跑通。混合调度会增加架构复杂度,要值得才做。如果业务场景单一,一家足够;如果业务场景多样,比如同时做编码助手和长文检索,可以在编码用 Claude、长文用 Gemini,这种混合是合理的。中小项目优先选简单,不要一上来就追求多模型路由。

国内开发者无法直接调用三家怎么办

大致三个方向。一是通过云厂商的合规版本,比如 Azure 中国版的 OpenAI 服务,AWS Bedrock 上的部分 Claude 模型。二是通过 OpenAI 兼容协议直接接入国产 API,例如 DeepSeek、Kimi、智谱等不少厂商都支持只换 endpoint。三是用海外法人公司直接接入,适合本身有海外业务的团队。多数中小开发者实际选择是直接用国产替代。

Claude 价格比 OpenAI 贵这么多还值得用吗

要看场景。在编码、长文档和需要高质量结构化输出的场景,Claude 一次性给对答案的概率更高,意味着重试更少,实际 API 总花费未必比便宜一档的对手高。日常聊天和高频简单请求用旗舰是浪费,选 Sonnet 或 Haiku 档位更合适。

Gemini 价格优势这么明显为什么不是默认选项

历史和生态原因。OpenAI 起步最早,SDK 兼容性最广,LangChain、LlamaIndex 等主流框架默认接 OpenAI。Vertex AI 又必须配 GCP,门槛比直连 OpenAI 高一些。但 2026 年 Gemini 旗舰实力已经追上来,价格优势明显,在长文档、原生多模态、批量请求等场景里越来越多新项目把它作为默认选项。

API key 泄露了怎么办

三层防护一起用。第一,key 永远不要进代码或 git,放环境变量或 secret manager。第二,生产 key 和开发 key 分开,生产 key 限制 IP 白名单或者使用受信任的服务端调用。第三,三家都支持设置每月支出上限,即便 key 泄露,损失也有封顶。一旦发现泄露,立刻在管理后台 revoke 并新建一个 key,然后排查日志看泄露原因。

Comparison of the three major APIs of OpenAI, Anthropic, and Google, and actual measurement of large model selection in 2026

📅 2026-05-18 11:24:10 👤 DouWen Editorial 💬 6 条评论 👁 3

The big model APIs of 2026 are already part of everyday infrastructure for developers. OpenAI, Anthropic, and Google are often compared side by side in the international market, but the differences are not small in terms of registration thresholds, model lineups, price ranges, speeds, compliance, and long-form support. This article will not cite the specific figures that may go wrong. It will only explain the choices of the three companies from the several dimensions that developers are concerned about when making actual selections. By the way, it will also talk about how domestic developers should deal with the issue of "not being able to connect to overseas APIs".

The general positioning of the three model lineups

All three vendors currently use a three-tier structure of "flagship + mid-range + extremely fast and cheap". Specific model names keep being updated, refer to the official pages for the latest. The rough correspondence is:

OpenAI's flagship series is responsible for heavy tasks such as multi-modality, complex reasoning, and agents. The price of the mid-range series is significantly lower than that of the flagship series and is suitable for large-volume requests. In addition, there are specialized reasoning optimization series, image generation series, and speech transcription series scattered in different product lines.

Anthropic's Claude series also has three levels: the flagship (Opus family) focuses on coding and complex tasks, the balanced (Sonnet family) is the daily workhorse, and the extremely fast and cheap (Haiku family) is suitable for a large number of lightweight calls. Version numbers change frequently, so it is recommended to check the current model list released on the Anthropic official website directly.

Google's Gemini series is also divided into flagship (Pro), balanced (Flash) and extremely small (Flash Lite), and there is also an on-device Nano for use on Android devices. Google opens up its models through both AI Studio and Vertex AI.

If you only look at "who is stronger on the mainstream list", the gap between the three companies has been very small, and the conclusions of different lists often conflict. The specific scores will not be cited here. I will talk about how to test it myself later.

The difference in experience between registering and getting a key

The differences in the developer registration processes of the three companies are mainly reflected in regional support and payment methods.

OpenAI provides API through platform.openai.com, and you can get the key immediately after registering an overseas credit card. There has never been a direct channel for accounts in mainland China, and overseas identities and overseas cards are usually required for normal use.

Anthropic provides API through console.anthropic.com. The process is similar and requires email and mobile phone verification. There is currently no direct connection channel in mainland China, so you usually have to use overseas legal entities or third-party agents.

Google provides two paths: AI Studio has the lowest registration threshold, can be used with a Google account, has a free quota, and is suitable for prototyping; production use is usually migrated to Vertex AI, which needs to be bound to a GCP project and payment method.

The domestic "compliant version" is currently relatively stable: the OpenAI series is accessed through Microsoft Azure China version, and the Anthropic series is accessed in some overseas regions through AWS Bedrock. Google Vertex AI does not provide a domestic compliance version in mainland China. The specific available areas and model lists will be updated by each company. It is best to confirm with the corresponding cloud vendor before purchasing compliance.

If your business itself is in China, the most direct way to bypass this layer is to use domestic alternatives: GLM, DeepSeek, Dark Side of the Moon Kimi, Alibaba Qwen, Byte Doubao and other series. The registration process is smooth, the payment is domestic, and many manufacturers have provided interfaces compatible with OpenAI SDK, and you can switch by just changing the endpoint.

The overall landscape of rate limiting and concurrency

The three companies have different design ideas on speed management.

OpenAI is a tiered system (Tier), which is automatically upgraded based on historical cumulative consumption and account opening time. The higher the tier, the higher the number of requests per minute and the number of tokens. The speed of a new account is relatively restrained at the beginning, which is suitable for starting with small traffic, and you can apply for an increase later.

Anthropic does not have a clear rating scale like OpenAI, but there are rate limits by account and model. If you need a higher rate, you can submit an application, and production users can also go to enterprise sales to get customized quotas. Anthropic also has a Batch API, which can submit non-real-time tasks in batches at a much cheaper price.

Google's quota on Vertex AI can be checked in the GCP console. Before production, you usually need to apply separately to raise the quota to the level required by the business. AI Studio is suitable for prototyping, and the production environment should not rely directly on the free tier.

Specific to latency and stability, the three companies differ greatly at different times and in different regions. Don't look at the numbers given in any blog posts for this kind of thing. Run it online for a week using real business traffic, and the results will be meaningful.

Contextual windows and long document handling

The three flagships currently support the context of million-level tokens. The specific upper limit and price strategy are subject to official announcement. It should be noted that the upper limit of the window is "how much can be filled", which does not mean "the accuracy will not be lost after it is full".

From experience, the first thing to lose accuracy in long context scenarios is "finding one or two details in a very long document", which is often called the needle in haystack type of task. Each company has made optimizations in this direction, but differences in actual measurements still exist. If your business is large document retrieval or long meeting minutes analysis, it is worth running a comparison with your own real documents instead of trusting any static evaluation.

The "structuring capabilities" on the output side also deserve attention. Which one is more stable when the model generates JSON, tables, or Markdown directly affects the complexity of subsequent parsing code. The overall level of the three companies is improving, but when it comes to the schema you commonly use, you still have to use your own data to test it.

Price-wise, the cost of long context input can vary greatly. Google has always been relatively cheap in the "large input + small output" scenario, Anthropic flagship is more expensive, and OpenAI is in the middle. However, the price list is updated very frequently, so I will not cite specific figures in this article, but refer to the pricing pages of the three companies.

Function calls and tool usage

All three support function calls/tool calls, and the design styles are slightly different.

OpenAI places the tool definition under the tools field. The model determines whether to call and the parameters. The streaming mode will incrementally return the calling JSON. The ecosystem is mature, and frameworks such as LangChain and LlamaIndex treat it as a first-class citizen by default.

Anthropic's tool_use is a structured content block. The model returns structured fields directly instead of strings. The code processing is slightly cleaner, and it supports returning multiple parallel tool calls at one time.

Google places function calls under the tools configuration and manages them uniformly with other multi-modal fields. When using Vertex AI, you must first adapt to GCP's set of certifications and project management.

In actual development, the difference is mainly the SDK style and ecological maturity, not the essential gap in capabilities. If your code has been built on a certain SDK, switching to another company requires writing an adaptation layer. There are already many open source middlewares that do this, just choose one you trust.

Division of labor in multimodal capabilities

To summarize simply and crudely:

OpenAI has the most complete product line: there are specialized products for text, image understanding, image generation, speech input and output, and video generation, but they are scattered in different models. When combined, they are more like building an ecosystem.

Anthropic mainly focuses on text and image understanding, as well as in-depth scenarios such as encoding and long documents. The support for video and native audio is not as good as the other two. If your application is mainly text and images and has high requirements for encoding or inference quality, Claude is a very convenient choice.

Google is more radical in the direction of native multi-modality. Text, images, audio, and video are processed uniformly in the same model, which is the most consistent in video and audio scenarios.

For projects that require multi-modal and complete closed loops, Google can cover more of them; for projects that require specialized image generation or video generation, OpenAI's corresponding products are more mature; for projects that focus on coding, long articles, and writing, Anthropic is usually the first choice.

Several experiences on price strategy

The price list is updated frequently, so don’t memorize the specific numbers. Several empirical judgments are given, which can be used for rough estimation:

At the same level, Google's Flash/Flash Lite series usually have the lowest cost in the "very large number of lightweight requests" scenario.
The flagship output price of Anthropic is generally on the high side, but in the scenario of getting it right once and reducing retries, the actual cost is not necessarily more than that of the opponent.
The overall price of OpenAI is in the middle, and old models often drop in price after new models are released.
The price of long context input is very different among the three companies. You must do a separate calculation before processing large documents.
All three companies have discounts for batch/asynchronous tasks. Try to go this route for non-real-time tasks.

If you really want to calculate the cost, the method is to estimate the total monthly input token amount, the total output token amount, the proportion of long context, and whether it can be batched, and then use each company's current price list to calculate all three candidate positions, rather than selecting based on impressions.

Suitable scenarios for each of the three companies

Take away the specific models and scores, and only look at the trade-offs in the general direction:

For coding assistant tools, the Claude series has the most stable reputation among mainstream developers. Mainstream IDE tools such as Cursor, Windsurf, and Aider recommend it by default, and there is a reason for it. The monthly fee for the Pro level is around 20 US dollars, please refer to the official page for details.

For general conversational products and chatbots, OpenAI has the earliest start in user experience, ecosystem, and plugins, with the broadest SDK compatibility. If you are building a conversational product for the C-side, it is almost impossible to go wrong starting with it.

In scenarios such as long document processing, contract review, podcast transcription summarization, and video content analysis, Google has obvious advantages in long article price and native multi-modality, while Anthropic is more stable in terms of "accuracy."

In scenes where Chinese is the main language, domestic models are sufficient, and the price is much lower than overseas. It is recommended to use the domestic model as a backup at least and leave a fallback outside the main path.

Regarding enterprise-level compliance, OpenAI through Azure, Anthropic through AWS Bedrock, and Google through Vertex AI/GCP all have corresponding compliance and data isolation solutions. Which one is suitable for you depends on what contracts the existing cloud vendors have signed, rather than just looking at the model itself.

How do I run a review myself?

Don't believe "who is stronger" given by anyone (including this article). The simplest way is:

The first step is to select 30 to 50 real examples from your own business. Each example has clear criteria for "good answers" and "bad answers".

The second step is to run a total of six models from three flagships and three mid-range brands using the same prompt and collect all the answers.

The third step is to put the answers and ground truth together, do a blind evaluation (evaluate it yourself or ask a colleague to evaluate it, the key is not to see the name of the model), and count the results according to business indicators.

The fourth step is to calculate "quality" and "price + speed" together to see which one has the highest cost performance, rather than just picking the one with the highest score.

This process can be completed within a week, and the conclusions drawn are closer to your business than any evaluation agency.

The overall trends of the three companies in the next year

Without predicting the specific version number, a few things are certain:

Prices will continue to go down. Mid-range and ultra-cheap grades will bear more and more demands, and flagship grades will gradually become "quality guarantees."

Long context and native multimodality will further become basic capabilities rather than advanced selling points.

Agentization is a common direction among the three companies. Tool invocation, long-process task execution, and multi-step reasoning will increasingly be supported natively at the model layer, rather than relying on prompt engineering.

The position of domestic models will continue to rise, and the price advantage + Chinese scene advantage will push them to become the default option for many businesses, while overseas flagships will maintain the "quality first" scene.

To sum up, there is no standard answer to "which one is the strongest" for large model API selection in 2026, only "which one is most relevant to your business". Run your own business sample first, and then decide the main path and fallback, which is more effective than any evaluation list.

FAQ

Should I use one from the beginning or a mix of three?

It is recommended to choose one first to run the business. Hybrid scheduling will increase the complexity of the architecture and should only be done if it is worth it. If the business scenario is single, one company is enough; if the business scenario is diverse, such as doing coding assistant and long text retrieval at the same time, you can use Claude for coding and Gemini for long text. This mix is reasonable. Small and medium-sized projects should be simple first, and don’t pursue multi-model routing right from the start.

What should domestic developers do if they cannot directly call the three companies?

Roughly three directions. One is through the compliant version of the cloud vendor, such as the Azure China version of the OpenAI service and some Claude models on AWS Bedrock. The specific available models and regions are subject to the latest announcement on the official website. The second is to directly access domestic APIs through OpenAI compatible protocols. Many manufacturers such as DeepSeek, Kimi, and Zhipu support only changing the endpoint without leaving the code unchanged. The third is to use overseas legal entities to access directly. This method is suitable for teams with overseas business. The actual choice of most small and medium-sized developers is to directly replace them with domestic ones. The domestic models in 2026 are sufficient in most scenarios.

Claude is so much more expensive than OpenAI, is it still worth using?

It depends on the scene. In scenarios where coding, long documents, and high-quality structured output are required, Claude has a higher probability of giving the right answer at one time, which means fewer retries. The actual total API cost may not be higher than that of cheaper opponents. For daily chats and high-frequency simple requests, it is a waste to use flagship. It is more suitable to choose Sonnet or Haiku.

The price advantage of Gemini is so obvious. Why is it not the default option?

historical and ecological reasons. OpenAI started the earliest and has the widest SDK compatibility. Mainstream frameworks such as LangChain and LlamaIndex are connected to OpenAI by default. Vertex AI must be equipped with GCP, and the threshold is higher than direct connection to OpenAI. However, the strength of the Gemini flagship in 2026 has caught up, and the price advantage is obvious. More and more new projects use it as the default option in scenarios such as long documents, native multi-modal, and batch requests.

What to do if the API key is leaked

Use three layers of protection together. First, the key should never be entered into code or git, put in environment variables or secret manager. Second, the production key and development key are separated, and the production key limits the IP whitelist or uses a trusted server call. Third, all three companies support setting a monthly spending limit. Even if the key is leaked, the loss will be capped. Once a leak is detected, immediately revoke it in the management console and create a new key, then check the logs to identify the cause of the leak, which is usually due to hard-coded values in configuration or code.

Source of inspiration: Issue 394 of Ruan Yifeng's "Technology Enthusiasts Weekly" https://www.ruanyifeng.com/blog/2025/10/weekly-issue-394.html

📝 本文来自抖文 www.douwen.me ，转载请保留出处。

原文链接：https://douwen.me/archives/1069/

💬 评论 (6)

ResearcherJ 2026-05-18 08:07 回复

Sharing this with my team.

AIWatcher 2026-05-17 20:38 回复

Solid breakdown, very useful.

SEOFan 2026-05-17 23:01 回复

Loved the FAQ section.

GrowthHacker 2026-05-17 18:58 回复

Bookmarked for reference.

DataNerd 2026-05-18 06:09 回复

Easy to follow.

DigitalNomad 2026-05-18 01:57 回复

Best summary I've read on this.