GPTVoiceTasker: Advancing Multi-step Mobile Task Efficiency Through Dynamic Interface Exploration and Learning

dc.contributor.authorVu, Minh Ducen
dc.contributor.authorWang, Hanen
dc.contributor.authorChen, Jieshanen
dc.contributor.authorLi, Zhuangen
dc.contributor.authorZhao, Shengdongen
dc.contributor.authorXing, Zhenchangen
dc.contributor.authorChen, Chunyangen
dc.date.accessioned2025-05-23T12:25:24Z
dc.date.available2025-05-23T12:25:24Z
dc.date.issued2024-10-13en
dc.description.abstractVirtual assistants have the potential to play an important role in helping users achieves different tasks. However, these systems face challenges in their real-world usability, characterized by inefficiency and struggles in grasping user intentions. Leveraging recent advances in Large Language Models (LLMs), we introduce GptVoiceTasker, a virtual assistant poised to enhance user experiences and task efficiency on mobile devices. GptVoiceTasker excels at intelligently deciphering user commands and executing relevant device interactions to streamline task completion. For unprecedented tasks, GptVoiceTasker utilises the contextual information and on-screen content to continuously explore and execute the tasks. In addition, the system continually learns from historical user commands to automate subsequent task invocations, further enhancing execution efficiency. From our experiments, GptVoiceTasker achieved 84.5% accuracy in parsing human commands into executable actions and 85.7% accuracy in automating multi-step tasks. In our user study, GptVoiceTasker boosted task efficiency in real-world scenarios by 34.85%, accompanied by positive participant feedback. We made GptVoiceTasker open-source, inviting further research into LLMs utilization for diverse tasks through prompt engineering and leveraging user usage data to improve efficiency.en
dc.description.statusPeer-revieweden
dc.identifier.isbn9798400706288en
dc.identifier.scopus85215098692en
dc.identifier.urihttp://www.scopus.com/inward/record.url?scp=85215098692&partnerID=8YFLogxKen
dc.identifier.urihttps://hdl.handle.net/1885/733752276
dc.language.isoenen
dc.publisherAssociation for Computing Machinery (ACM)en
dc.relation.ispartofUIST 2024 - Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technologyen
dc.relation.ispartofseries37th Annual ACM Symposium on User Interface Software and Technology, UIST 2024en
dc.relation.ispartofseriesUIST 2024 - Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technologyen
dc.rightsPublisher Copyright: © 2024 Owner/Author.en
dc.titleGPTVoiceTasker: Advancing Multi-step Mobile Task Efficiency Through Dynamic Interface Exploration and Learningen
dc.typeConference paperen
dspace.entity.typePublicationen
local.contributor.affiliationVu, Minh Duc; CSIROen
local.contributor.affiliationWang, Han; Monash Universityen
local.contributor.affiliationChen, Jieshan; Monash Universityen
local.contributor.affiliationLi, Zhuang; CSIROen
local.contributor.affiliationZhao, Shengdong; City University of Hong Kongen
local.contributor.affiliationXing, Zhenchang; School of Computing, ANU College of Systems and Society, The Australian National Universityen
local.contributor.affiliationChen, Chunyang; Technical University of Munichen
local.identifier.doi10.1145/3654777.3676356en
local.identifier.pure36bc2f4c-0cf4-4d2e-9dc4-a72f8f83dceden
local.identifier.urlhttps://www.scopus.com/pages/publications/85215098692en
local.type.statusPublisheden

Downloads