Latest Update
OpenAI's GPT-5.6: three models, one half the price
2026.06.29


Generate with ChatGPT Image 2 Medium. Prompt:
Image 1 (scene reference): https://assets.carat-api.im/upload_from_app/public/20260629/e6d7270e-7fee-4411-95d0-7030092b7e11.jpg
Image 2 (identity reference): attach a clear front photo of yourself
Use Image 1 as the composition and scene reference. Use Image 2 as the identity reference for the person shown on the jumbotron.
Create one photorealistic vertical 9:16 first-frame image for an AI video.
Match Image 1 closely for the overall scene: nighttime Korea University campus festival, huge burgundy LED jumbotron, castle-like campus building silhouette behind it, crowd in the lower foreground, burgundy KU flags or banners, realistic festival-night lighting, and handheld phone-camera perspective.
Replace the person shown on the jumbotron with the person from Image 2. Preserve the real identity from Image 2: face shape, facial proportions, skin tone, hair, eyes, nose, lips, and natural expression. The person should appear only inside the LED jumbotron screen, not as a real person standing in the foreground.
The jumbotron should show the person as if they were naturally caught by the live festival camera. They are watching the stage rather than posing for the viewer, looking slightly upward and 25-45 degrees off-camera, with a soft surprised smile. Keep the face clear and recognizable while still integrated into the LED screen with subtle pixel texture, scanlines, screen glow, and low-light exposure.
Keep the Korea University atmosphere from Image 1: deep crimson and burgundy lighting, realistic outdoor concert glow, dark student crowd, warm stage lights, subtle haze, LED texture, and phone-camera noise. If visible text fits naturally, use simple readable text such as "KOREA UNIVERSITY FESTIVAL LIVE".
Keep phones sparse and realistic. Only a few smartphones should be visible in the lower foreground, around 3 to 5 total. Visible phone screens should show a tiny blurred view of the same jumbotron scene, with natural glare and exposure differences. Avoid dense phone clusters, large foreground phones, black blank phone screens, bright phone flash dots, baseball/stadium broadcast elements, glossy advertising poster look, watermark, or extra captions.
Use @image_1 as the exact first frame and main scene reference. Use @image_2 as the expression reference for the subject shown on the jumbotron.
Create a realistic vertical video of a nighttime Korea University festival. The camera is filming from inside the crowd, looking up at a huge burgundy LED jumbotron near a castle-like campus building silhouette. The jumbotron shows the subject from @image_2 as if they were naturally caught by the live festival camera. Keep the crimson university atmosphere, large outdoor screen, dense student crowd, stage lighting, and handheld phone-camera perspective from @image_1.
The subject on the jumbotron is watching the stage at first. They hear the crowd reaction, realize they are on the big screen, look slightly surprised, then smile shyly and naturally. Add a small head movement, subtle hair movement, and a gentle brief wave. The expression should feel like a real candid festival moment.
The students below the screen are active throughout the video. They raise phones, cheer, wave, point at the jumbotron, move shoulders, and record the moment. Phone screens glow across the audience. Stage lights sweep through the crowd. The LED texture, slight pixel grid, screen bloom, and live-camera feel remain visible.
Camera motion: realistic handheld phone footage with a slow push-in toward the jumbotron. Keep the scene lively, warm, and believable, like a real Korea University festival clip that someone captured on their phone.
Was this helpful? Share with your friends
Share
Get daily AI trend updates