Complete tutorial on ChatGPT 4o image generation, 2026 from Ghibli style to ID photo practice

📅 2026-05-15 11:23:58 👤 DouWen Editorial 💬 6 条评论 👁 8

ChatGPT 4o image generation is one of the hottest practical features in 2026. Compared with the early DALL-E 3 era, the current 4o model can draw pictures directly in the dialog box, understand the context, make multiple rounds of modifications in succession, and even understand reference pictures. An ordinary user who can write prompt words can use ChatGPT 4o to create posters, avatars, illustrations, ID photos, product pictures, and concept sketches.

This article compiles the latest gameplay in May 2026. It includes subscription requirements, basic prompt word writing, eight mainstream styles of practical prompt words, common error troubleshooting, and copyright issues for commercial use. After reading it once, you can start drawing pictures. There is no need to learn Midjourney or Stable Diffusion.

Subscription threshold for ChatGPT 4o image generation

Picture

Free users can generate two images per day. This is OpenAI’s adjusted policy in April 2026. Build quality is not compromised but resolution is limited to 1024 x 1024. Plus users get unlimited generation for $20 per month, with resolutions up to 1792 x 1024 or 1024 x 1792. $200 per month for Pro users also adds priority queuing and Sora video generation credits.

If you only occasionally create avatars, the free version is completely sufficient. If you do content creation, you must subscribe to Plus. The ability to generate hundreds of tickets every day exceeds the limit of the Midjourney Basic plan, and the price-performance ratio is obvious.

The first step is to choose the right entrance and model

Picture

Select GPT-4o above the ChatGPT dialog box. This is the default multi-modal model in 2026, with built-in image generation. Don't choose 4o-mini, that one doesn't support drawing. Plus users can also choose GPT-4.5, but the mapping capabilities are about the same as 4o.

The easiest way to send picture instructions is to describe them directly in Chinese or English. For example, post the sentence "Draw an orange cat jumping in the moonlight, Ghibli style". ChatGPT will automatically call DALL-E backend model generation. A picture usually returns in 30 to 60 seconds. After generating, you can right-click to download the original image by hovering over it.

Five elements for writing good prompt words

Picture

A complete prompt word contains five elements. What is the subject, such as a character, animal, object, scene. What is the style, Ghibli, realistic photography, flat illustration, 3D rendering, pixel style. Light conditions, backlight, morning light, neon, cloudy, studio shooting. Composition, close-up, full body, top view, profile, wide angle. Finally, add the atmosphere words, warmth, loneliness, passion, and tranquility.

Give a complete example. "A 30-year-old female programmer is sitting in front of a Mac typing on the keyboard, with golden side light from the setting sun, half-length composition, realistic photography style, soft atmosphere, and abstract circuit board light spots in the background." This five-element prompt word is stable in drawing. The quality is three levels higher than simply writing "draw a programmer".

How to create the most similar Ghibli-style portraits

Picture

In March 2025, ChatGPT unlocked the Ghibli style and swept the entire network. As of May 2026, this gameplay is still popular. The trick is to add "Studio Ghibli style, hand-drawn animation, soft watercolor background" to the prompt. Adding the sentence "warm color palette, gentle expression" will make the character's expression gentler.

If you want to restore the style of a specific work by Hayao Miyazaki, you can specify "Princess Mononoke style" or "Spirited Away style". The former has a dark green forest atmosphere, while the latter has a hot spring town atmosphere. Writing "Miyazaki Hayao style" in Chinese can also be recognized, but the effect is slightly inferior. It is recommended to use English terms.

Documentary and identity photos

Picture

In 2026, ChatGPT 4o can already output ID photos that can be used as avatars on social platforms. The prompt word reads "professional headshot of an asian woman in her late 20s, plain white background, soft studio lighting, business casual attire, looking directly at camera, photorealistic". The picture basically complies with LinkedIn avatar standards.

It should be noted that OpenAI is restricted from directly generating photos of "specific real people". For example, writing "Draw an ID photo of Liu Yifei" will be rejected. But you can describe "a long-haired Asian woman, 25 years old, with a gentle temperament" to indirectly achieve a similar style. OpenAI is constantly tightening this boundary, and will be stricter for celebrity simulations starting in March 2026.

Practical examples of posters and marketing graphics

For event posters, "poster design, central headline area reserved blank, vivid gradient background, modern sans-serif vibe, top-down layout" is recommended. By clearly writing "leave space in the center for the title", the model will know how to leave space for the text. Otherwise it will fill in a garbled text by itself.

For e-commerce product pictures, you can write "product photography of a coffee mug on marble surface, soft window light from left, depth of field, minimal style". Pictures produced with this structure can be directly placed on the Shopify product page. It is recommended to use PS to erase the blurred text after generating it before using it again.

Hidden tricks for multiple rounds of editing

ChatGPT 4o’s strongest capability is multi-round editing. After generating the first picture, you can directly say "change the background to a sunset on the beach" or "let the character change into a red coat". The model will be modified based on the previous picture to keep the character's face consistent. This is where Midjourney falls short, because MJ has no dialogue context.

But be aware that if the changes are too big, the characters in the new picture may "change their faces". The trick is to keep changes to 1 element or less. For example, change the background first, make sure you are satisfied, and then change the clothes. If you change 3 things at the same time, it will almost certainly look out of shape.

Copyright issues for restricted and commercial use

The copyright of images generated by ChatGPT 4o belongs to the user and can be used commercially. This is made clear by OpenAI in Terms. However, Plus users' images may be used by OpenAI for model training unless the "Improve the model for everyone" switch is turned off in the settings.

Content not to be generated includes real unauthorized celebrities, specific faces of minors, politically sensitive figures, violent and bloody content, and sexual content. The model comes with a Safety Filter, and violations will be directly rejected. If it is triggered repeatedly, you will be warned or even banned.

Common error reports and troubleshooting ideas

The error "I can't help with that request" usually triggers content review. Another way to avoid sensitive words is to replace "naked" with "wear light-colored clothes." The error "unable to generate" indicates that the backend is busy, please wait a few minutes and try again. If an error is reported all day long, the quota for the day is exhausted, and free users only have two quotas per day.

It is normal for the picture quality to be unstable. Running the same prompt word 5 times can produce 5 results. You can basically get a satisfactory version by trying twice. It is recommended to download the archive immediately after each build as conversation refreshes may be lost.

Trade-offs with Midjourney Stable Diffusion

If you only pursue the upper limit of image quality, Midjourney V7 is still number one. The details, light and shadow, and artistic sense are all half a step better than ChatGPT 4o. But the threshold is high to log into Discord, learn parameters, and wait in the queue.

The advantages of ChatGPT 4o are ease of use and conversation editing. Write a reminder word in one sentence in Chinese, change the details in one sentence, and write a copy that is naturally coherent. Suitable for the daily work of content creators, self-media, e-commerce operations, and product managers. Professional illustrators still use MJ or SD, and 4o is enough for everyday users.

Advanced gameplay and reference picture techniques

In 2026, ChatGPT 4o supports uploading reference images as the basis for generation. You can upload a portrait photo and say "Click this face to generate Ghibli style", and the output will retain facial features and have an animation texture. This is the most convenient way to draw stylized avatars for friends and family.

You can also upload scene reference images. For example, uploading a photo of your living room and saying "Use this layout to generate a Nordic style decoration rendering" can provide visual inspiration for decoration design. Architectural designer product managers use this reference drawing to significantly improve their workflow efficiency.

Advanced gameplay is to stack multiple references. First generate a basic picture, then upload the second picture and say "change the light to the night scene neon as shown in the second picture". Multiple rounds of superposition can gradually approximate the picture in your mind, which is more controllable than giving the perfect prompt word all at once.

FAQ

How many pictures can ChatGPT 4o free version generate?

Free users will be able to generate 2 images per day starting in May 2026. This limit resets within a 24-hour sliding window. Plus subscription costs $20 per month for unlimited generation. If you only occasionally produce pictures, the free version is enough, but for stable needs, it is recommended to order Plus.

Can the generated images be used commercially?

Can. The OpenAI Terms of Use clearly state that users have ownership and commercial use rights for generated images. But make sure that the prompt words do not infringe third-party copyrights, for example, do not generate Disney characters or existing brand logos. Also note that images under the Plus plan may be used for model training, which can be turned off in the settings.

How to make the characters in the generated pictures consistent

The most effective approach is to iterate through the same conversation. After generating the first picture you are satisfied with, only change one element at a time, such as "change background" or "change clothing". Don't open a new dialogue and re-describe the character, as that will almost certainly change the face. You can also upload a reference image to modify the model based on the reference face.

Is it better to write the prompt words in Chinese or English?

English is slightly better. OpenAI training data is mainly in English, and the understanding of English style terms is more accurate. Chinese can also be used, but some detailed words such as "cyberpunk" model are unclear to understand. It is recommended that the core style words be written in English and the main description be written in a mixture of Chinese. This mixed Chinese and English prompt word has the most stable effect.

Why is the generated text always garbled?

ChatGPT 4o is still unstable when rendering text in images. This is a problem with the underlying model, not the way the prompt words are written. The best strategy is to let AI generate purely visual materials, and use PS or Figma to add text later. If you must render text with AI, write "large clear English word 'SALE' in bold red". Short English words have a higher success rate.

📝 本文来自抖文 www.douwen.me ,转载请保留出处。

💬 评论 (6)

D
DataNerd 2026-05-14 17:29 回复

Stats really back it up.

D
DevTools 2026-05-15 05:40 回复

Practical tips not fluff.

C
ContentDev 2026-05-14 20:00 回复

Step-by-step is gold.

D
DataNerd 2026-05-15 09:02 回复

Thanks for the detailed comparison.

G
GrowthHacker 2026-05-14 16:39 回复

Solid breakdown, very useful.

T
TechReader 2026-05-14 20:42 回复

Sharing this with my team.