Industry aspirations versus user realities with AI agent software.
There is growing imprecision about what “AI agents” are, what they can do, and how effectively they can be used by their intended users. We pose two research questions: (i) how does the tech industry conceive of and market “AI agents,” and (ii) what challenges do end-users face when attempting to use commercial AI agents for their advertised uses?
We performed a systematic review of marketed use cases for 102 commercial AI agents, finding that they fall into three umbrella categories — orchestration, creation, and insight. We then observed N = 31 participants attempting representative tasks for each category on two popular commercial AI agent tools, surfacing five usability barriers and six design recommendations.
Our study proceeds in two parts. First, a systematic review of commercial agents yields a taxonomy of marketed use cases. Second, a think-aloud and interview study observes real users attempting representative tasks drawn from that taxonomy.
Across 102 reviewed tools, marketed use cases cluster into three categories. Counts exceed 102 because many agents span more than one category.
Participants rated the usability of two tools — OpenAI Operator and Manus — on three representative tasks using the System Usability Scale (SUS, 0–100, higher is better).
Despite generally positive usability scores, participants reported recurring frustrations. Five themes emerged from the qualitative data; click a card to reveal a representative quote.
Users cannot predict what an agent is actually able to do.
“You've got to sit there and make sure that your initial prompt is perfect.”
Users can't predict the agent's actual capabilities, turning delegation into frustrating 'prompt gambling.'
Agents expect delegation before earning user confidence.
“I was waiting for it to ask me for some more information.”
Agents expect immediate delegation without first establishing credibility through preference-elicitation or demonstrating competence for sensitive tasks.
Agents do not adapt their interaction style to the user or task.
“It's kind of like I'm giving you a job, and you're throwing the job back at me.”
They act as 'lone wolf' execution tools that fail to adapt to a user's need for hands-on guidance or mid-task oversight.
Agents overwhelm users with output and ask for inarticulable input.
“Oh, my God! It threw out so much stuff... it's almost an overwhelming amount of information.”
Agents generate excessive, poorly-formatted output and force users to articulate complex, subjective preferences in a cognitively demanding way.
Agents lack self-awareness of their errors and limitations.
“It just was kind of circling... it's seeking to provide an answer rather than to say 'I don't know.'”
They lack the self-awareness to recognise their own errors, leading to time-wasting 'try-fail cycles' that require manual human debugging.
Actively collect preferences, skills, and collaboration styles before executing a task.
Develop metacognitive abilities to recognise capability limits and stop confidently.
Adapt the interface and interaction style based on task type and user signals.
Surface planning and checkpoints so users can steer before and during execution.
Support multiple input modalities beyond free-form text prompts.
Enable precise iteration on outputs via contextual, direct-manipulation controls.
@inproceedings{shome2026johnny,
title={Why Johnny Can't Use Agents: Industry Aspirations vs. User Realities with AI Agent Software},
author={Shome, Pradyumna and Krishnan, Sashreek and Das, Sauvik},
booktitle={Proceedings of the ACM Conference on AI and Agentic Systems (CAIS '26)},
year={2026}
}