What's OpenAI's 'Operator'? An AI agent that performs web tasks for users

/3 min read

ADVERTISEMENT

OpenAI's latest offering "Operator" can perform digital tasks like booking flights, planning trips, ordering groceries, just like humans while navigating websites and apps with virtual mouse, keyboard
What's OpenAI's 'Operator'? An AI agent that performs web tasks for users
The AI agent will be launching in the US for now for the ChatGPT Pro tier at $200 per month.  Credits: Getty Images

In a major development in the world of AI, OpenAI has released a "research preview" of an AI agent "Operator", which can perform web tasks for users. So now, instead of doing a task yourself, users will be able to direct the OpenAI operator to do tasks like ordering dinner with specific ingredients, planning a trip, budgeting and interests, booking flights, etc.

"Today, we introduced a research preview of Operator, an agent that can go to the web to perform tasks for you. Powering Operator is Computer-Using Agent (CUA), a model that combines GPT-4o's vision capabilities with advanced reasoning through reinforcement learning," OpenAI says in a blog post.

The AI agent will be launching in the US for now for the ChatGPT Pro tier at $200 per month.

Fortune India Latest Edition is Out Now!

Read Now

The AI agent is trained to interact with graphical user interfaces (GUIs) — the buttons, menus, and text fields people see on a screen — just as humans do. This gives it the flexibility to perform digital tasks without using OS- or web-specific APIs.

OpenAI says: "AI agent builds off of years of foundational research at the intersection of multimodal understanding and reasoning. By combining advanced GUI perception with structured problem-solving, it can break tasks into multi-step plans and adaptively self-correct when challenges arise. This capability marks the next step in AI development, allowing models to use the same tools humans rely on daily and opening the door to a vast range of new applications."

Sam Altman-led OpenAI says that while CUA is still early and has limitations, it sets new state-of-the-art benchmark results, achieving a 38.1% success rate on OSWorld for full computer use tasks, and 58.1% on WebArena and 87% on WebVoyager for web-based tasks. "These results highlight CUA’s ability to navigate and operate across diverse environments using a single general action space."

The world's most advanced AI company says the AI agent has been developed with safety as a top priority to address the challenges posed by an agent having access to the digital world. "We are releasing CUA through a research preview of Operator at operator.chatgpt.com⁠ for Pro⁠ Tier users in the U.S. to start. By gathering real-world feedback, we can refine safety measures and continuously improve as we prepare for a future with increasing use of digital agents."

How does AI agent works & its implications

OpenAI's agent or CUA processes raw pixel data to understand what’s happening on the screen and uses a virtual mouse and keyboard to complete actions. It can navigate multi-step tasks, handle errors, and adapt to unexpected changes. This enables CUA to act in a wide range of digital environments, performing tasks like filling out forms and navigating websites without needing specialised APIs.

Given a user’s instruction, the agent operates through an "iterative loop" that integrates perception, reasoning, and action. The AI agent establishes a new state-of-the-art in both computer use and browser use benchmarks by using the same universal interface of screen, mouse, and keyboard, says the company.

Reacting to the development, former OpenAI founding team member Andrej Karpathy says projects like OpenAI’s Operator are to the digital world as humanoid robots are to the physical world. "One general setting (monitor keyboard and mouse, or human body) that can in principle gradually perform arbitrarily general tasks, via an I/O interface originally designed for humans. In both cases, it leads to a gradually mixed autonomy world, where humans become high-level supervisors of low-level automation. A bit like a driver monitoring the Autopilot. This will happen faster in digital world than in physical world because flipping bits is somewhere around 1000X less expensive than moving atoms. Though the market size and opportunity feels a lot bigger in physical world."

Rowan Cheung, founder of AI newsletter The RunDownAI, says he got early access to ChatGPT Operator. "It's OpenAI's new AI agent that autonomously takes action across the web on your behalf."

Some of the most impressive use cases he tried include ordering dinner ingredients based on a picture and a recipe, planning a weekend trip based on hidden gems off Reddit, my budget and interests, crypto investment research based on tokens that are actually worth looking into, booking a one-way flight from Zurich to Vienna using the booking integration, scheduling an appointment with barber after looking at his Google Calendar schedule/availability.

As a downside, he says the operator is still a research preview and is improving. "I found that: -Quite a few sites were blocked after they detected the AI. There's a limited set of partner integrations. It's true purpose is to take actions across the web."

Fortune India is now on WhatsApp! Get the latest updates from the world of business and economy delivered straight to your phone. Subscribe now.