Industry Aspirations vs. User Realities with AI Agent Software
There is growing imprecision about what “AI agents” are, what they can do, and how effectively they can be used by their intended users. We pose two key research questions: (i) How does the tech industry conceive of and market “AI agents”? (ii) What challenges do end-users face when attempting to use commercial AI agents for their advertised uses?
We first performed a systematic review of marketed use cases for 102 commercial AI agents, finding that they fall into three umbrella categories: orchestration, creation, and insight. Next, we conducted a usability assessment where N = 31 participants attempted representative tasks for each of these categories on two popular commercial AI agent tools.
Analyzed 102 AI agents to build taxonomy
31 participants using Operator & Manus
Identified usability barriers and implications
Agents that act on behalf of users to manipulate software interfaces
Generate structured documents with well-defined formats
Distill knowledge into structured takeaways
Users completed tasks with “Good” to “Excellent” usability ratings
Manus excelled at slide making (90.6), Operator at holiday planning (88.8)
Despite success, users faced 5 significant usability challenges
Higher scores indicate better usability (0-100 scale)
Click on any card to reveal user quotes and detailed explanations
Agent capabilities don't match user expectations
“You've got to sit there and make sure that your initial prompt is perfect.”
Users can't predict the agent's actual capabilities, turning delegation into frustrating "prompt gambling".
Without demonstrating competence or security
“I was waiting for it to ask me for some more information.”
Agents expect immediate delegation without first establishing credibility through active preference-elicitation or demonstrating the competence necessary to handle sensitive tasks.
Rigid interaction styles that don't adapt
“It's kind of like I'm giving you a job, and you're throwing the job back at me.”
They act as "lone wolf" execution tools that fail to adapt to a user's need for hands-on guidance or mid-task oversight.
Overwhelming users with excessive output
“Oh, my God! It threw out so much stuff...it's almost an overwhelming amount of information.”
This arises from agents generating excessive, poorly-formatted output and forcing users to articulate complex, subjective preferences in a cognitively demanding way.
Agents lack self-awareness of limitations
“It just was kind of circling...it's seeking to provide an answer rather than to say 'I don't know.'”
They lack the self-awareness to recognize their own errors or limitations, leading them to get stuck in time-wasting "try-fail cycles" that require manual human debugging.
Design implications for building next-generation AI agents
Collect preferences, skills, and collaboration styles
Develop metacognitive abilities to recognize limitations
Adapt interface based on task type and user preferences
Support user control during planning and execution phases
Support multiple input modalities beyond text prompts
Enable precise iteration on outputs with contextual controls
@article{shome2025johnny,
title={Why Johnny Can't Use Agents: Industry Aspirations vs. User Realities with AI Agent Software},
author={Shome, Pradyumna and Krishnan, Sashreek and Das, Sauvik},
journal={arXiv preprint arXiv:2509.14528},
year={2025}
}