Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Day 6/12 OpenAI #723

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 84 additions & 0 deletions blog/en/day-6-12-openai-chatgpt-can-finally-see-what-you-see.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
---
title: "Day 6/12 OpenAI: ChatGPT Can Finally See What You See"
description: "OpenAI revolutionizes AI interaction by adding live video capabilities to ChatGPT, allowing users to show and share their screen for real-time visual assistance"
image: ""
authorUsername: "sanchayt743"
---

# Day 6/12 OpenAI: ChatGPT Can Finally See What You See

OpenAI just added live video to ChatGPT, changing how we talk with AI forever. Starting December 13, 2024, users can show ChatGPT what they're working on through their camera or by sharing their screen. The AI now sees and understands visual information in real-time, helping with everything from fixing technical problems to learning new skills.

This update builds on ChatGPT's **Advanced Voice Mode**, which already lets people talk naturally with the AI in over 50 languages. Now, when you click the voice mode button, you'll see a new camera option. Turn it on, and ChatGPT starts seeing what you see. Show it a broken device, walk it through a process, or share your screen to get instant help with any task.

The rollout starts now for teams and most Plus and Pro users. If you're a Plus or Pro user in Europe, you'll get access in the coming weeks. Enterprise users and schools will see these features early next year. OpenAI made sure the system works smoothly across phones, tablets, and computers.

Along with video, OpenAI surprised users with a special holiday addition: **Santa Mode**. Look for the snowflake icon, and you can have real conversations with an AI Santa who sees, hears, and responds with holiday spirit. This feature resets everyone's voice mode limits, letting all users try the new video capabilities through a festive lens.

The update marks a significant change in how AI helps us. Instead of explaining problems with words alone, users can now show ChatGPT exactly what they mean. The AI watches, understands, and guides users through solutions in real-time. This makes complex tasks simpler and helps bridge the gap between human visual thinking and AI assistance.

In the coming sections, we'll explore exactly how these features work, share real examples of the technology in action, and show you how to make the most of ChatGPT's new visual powers.

## Inside ChatGPT's New Video Powers

<Img src="https://imagedelivery.net/K11gkZF3xaVyYzFESMdWIQ/7f0aa4ca-dfd3-4e24-2d30-41327e7eae00/full" />

Using ChatGPT's video feature feels natural from the first click. Open the app, and you'll find the familiar voice mode button at the bottom right of your screen. Click it, and you'll now see a new camera icon. This single button opens up a world of visual interaction.

Starting a video chat takes just seconds. The interface stays clean and simple, with a blue orb at the top of your screen showing that ChatGPT is watching and listening. You can switch between your front and back camera, or share your screen instead. ChatGPT sees everything in real-time, understanding and responding to what you show it.

The real magic happens when you start using the feature. In a live demonstration, a user made pour-over coffee while ChatGPT watched and guided each step. The AI noticed details like water temperature, pouring technique, and even suggested improvements to the brewing process. This wasn't just pre-programmed coffee instructions but ChatGPT adapted its guidance based on what it saw in real-time.


Screen sharing works just as smoothly. Click the share option, choose what you want to show, and ChatGPT starts analyzing your screen. Need help with a spreadsheet? Show it your work. Stuck on a design? Share your canvas. The AI sees your screen just like a human colleague would, pointing out issues and suggesting improvements instantly.

What makes this feature powerful is how it combines with ChatGPT's existing abilities. While watching your video feed, it can still process voice commands, understand context from previous messages, and even recall details from earlier in your conversation. During the launch demonstration, ChatGPT remembered people's names and what they were wearing, showing how it builds a complete understanding of the interaction.

The system also handles different types of visual information. From text on screens to physical objects, from people's gestures to on-screen animations, ChatGPT processes it all in real-time. This means you can switch between showing objects, sharing screens, and regular conversation without missing a beat.

For anyone worried about privacy, OpenAI built this feature with clear user control. The camera only activates when you choose, with obvious indicators showing when video is live. You can switch the video feed off anytime, and return to voice-only or text chat instantly.

## Real-World Applications: ChatGPT in Action


The real power of ChatGPT's video capabilities becomes clear when you see it in action. Let's look at a practical example that shows exactly how this technology makes a difference.

<Img src="https://imagedelivery.net/K11gkZF3xaVyYzFESMdWIQ/75d50476-fbfb-4cf2-2318-5b3082d65700/full" />

During the launch demonstration, OpenAI showed how ChatGPT guides users through making the perfect pour-over coffee. As shown in the sequence above, the AI watched every step of the brewing process. It noticed details that matter: the water temperature, the correct pouring height, the timing of each step. When the user started pouring water, ChatGPT gave real-time guidance on technique, explaining the importance of the blooming process and suggesting adjustments to improve the brew.

This coffee demonstration reveals several key capabilities. First, ChatGPT recognizes and tracks physical objects in real-time. It identified the kettle, filter, and brewing equipment instantly. Second, it understands processes and sequences, knowing what should happen next and why. Third, it provides guidance that adapts to what it sees, not just following a script but responding to the user's actual actions.

These capabilities translate across countless real-world scenarios. **Mechanics** can point their cameras at engine problems for instant diagnosis. **Artists** get real-time feedback on their technique. **DIY enthusiasts** receive step-by-step guidance while keeping both hands free to work. The system even helps with complex software tasks, watching your screen as you work through problems and suggesting better approaches.

The key difference lies in ChatGPT's ability to process visual information naturally. Instead of users trying to describe what they're seeing or doing, they can simply show the AI and get immediate, relevant responses. This cuts through communication barriers and makes problem-solving faster and more accurate.

For professionals, this means streamlined workflows. Developers debug code with an AI that can actually see the error messages and understand the context. Designers get instant feedback on layouts and compositions. Teachers can demonstrate concepts while receiving suggestions for clearer explanations.

What makes this feature particularly powerful is how it combines with ChatGPT's other capabilities. While watching your actions, it maintains context from your conversation, processes voice commands, and builds a complete understanding of what you're trying to achieve. This creates an assistance experience that feels natural and truly helpful, adapting to your needs in real-time.

## New Voice Addition: Santa Joins ChatGPT's Voice Lineup

<Img src="https://imagedelivery.net/K11gkZF3xaVyYzFESMdWIQ/c2f75adb-5b4a-4114-2382-2d72edffe400/full" />

OpenAI added a festive touch to ChatGPT's voice features by introducing a new Santa voice option. Throughout December, users can engage with ChatGPT using this specially designed voice that brings holiday warmth to conversations. The new voice maintains the same high-quality audio processing while adding a distinct jolly character to responses.

Accessing the new voice is simple. Users will notice a snowflake icon on their home screen or can select it through ChatGPT settings. OpenAI has made this special voice available to all users with Advanced Voice access, and they've even reset voice usage limits once so everyone can try the new addition.

The Santa voice works across all standard ChatGPT functions. Whether you're asking technical questions, seeking advice, or just having a casual conversation, you can do it all with the new voice option. It's available in the latest mobile apps, desktop apps, and on chat.openai.com for web users.

For Plus and Pro subscribers, this new voice option integrates perfectly with the recently launched video capabilities. The system maintains its ability to process multiple languages and provide natural, contextual responses, just now with an optional festive delivery style.

## Looking Ahead: What This Really Means For You

I've spent hours testing ChatGPT's new video capabilities, and I keep coming back to one thought: this changes the game for anyone who's ever tried to explain a technical problem or learn a new skill online. Think about the last time you struggled to describe an issue to technical support, or tried to learn something new from written instructions. Now imagine just showing it directly to an AI that understands what it's seeing.

The timing of this release tells us something important. Yes, OpenAI faced delays after their voice rollout sparked controversy. Yes, Google's Gemini just launched similar features. But what matters isn't who's first but how these tools will reshape our daily interactions with technology.

This isn't just about ChatGPT getting a camera. It's about the moment AI truly entered our visual world. When your teenage cousin needs homework help, or your grandmother can't figure out her new phone settings, or you're trying to fix your car, showing will finally replace telling. The barriers between human visual thinking and AI assistance are starting to crumble.

But here's what keeps me up at night: we're watching the first steps of something much bigger. Remember how video calling seemed futuristic until suddenly it was everywhere? That's where we are with visual AI. Today it's helping you make coffee or debug code. Tomorrow? The applications seem limitless.

As we head into 2024, one thing is clear: the way we work with AI is fundamentally changing. The question isn't whether visual AI will become part of your daily life – it's how you'll use it to solve problems you once thought impossible.

I'd love to hear your thoughts. How do you see yourself using these new capabilities? What problems could you solve by showing instead of telling? The future of AI interaction is being written right now, and we're all part of the story.
21 changes: 21 additions & 0 deletions script_output.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
Processing file: blog/en/day-6-12-openai-chatgpt-can-finally-see-what-you-see.mdx
Downloading image from URL: https://iili.io/2W4oL6g.md.jpg
Successfully downloaded image to: images/2W4oL6g.md.jpg
Uploading image: images/2W4oL6g.md.jpg
Successfully uploaded image. Variant URL: https://imagedelivery.net/K11gkZF3xaVyYzFESMdWIQ/7f0aa4ca-dfd3-4e24-2d30-41327e7eae00/full
Replaced image with new URL: https://imagedelivery.net/K11gkZF3xaVyYzFESMdWIQ/7f0aa4ca-dfd3-4e24-2d30-41327e7eae00/full
Deleted local image: images/2W4oL6g.md.jpg
Downloading image from URL: https://iili.io/2W4oZFa.md.jpg
Successfully downloaded image to: images/2W4oZFa.md.jpg
Uploading image: images/2W4oZFa.md.jpg
Successfully uploaded image. Variant URL: https://imagedelivery.net/K11gkZF3xaVyYzFESMdWIQ/75d50476-fbfb-4cf2-2318-5b3082d65700/full
Replaced image with new URL: https://imagedelivery.net/K11gkZF3xaVyYzFESMdWIQ/75d50476-fbfb-4cf2-2318-5b3082d65700/full
Deleted local image: images/2W4oZFa.md.jpg
Downloading image from URL: https://iili.io/2W4otcJ.md.jpg
Successfully downloaded image to: images/2W4otcJ.md.jpg
Uploading image: images/2W4otcJ.md.jpg
Successfully uploaded image. Variant URL: https://imagedelivery.net/K11gkZF3xaVyYzFESMdWIQ/c2f75adb-5b4a-4114-2382-2d72edffe400/full
Replaced image with new URL: https://imagedelivery.net/K11gkZF3xaVyYzFESMdWIQ/c2f75adb-5b4a-4114-2382-2d72edffe400/full
Deleted local image: images/2W4otcJ.md.jpg
Successfully processed file: blog/en/day-6-12-openai-chatgpt-can-finally-see-what-you-see.mdx
CHANGES_MADE