Visual Intelligence For GUI Automation And Beyond

Date

2024

Authors

Xie, Mulong

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The revolutionary advancement of computing hardware and information technology deeply reform the means of interaction between electronic devices and humans. Evolving from the esoteric textual Command-Line Interface (CLI), the Graphical User Interface (GUI) that leverages graphical elements to convey information and provide intuitive access to the underlying system has dominated the form of human-device interface. Meanwhile, the development of GUI involves various specific processes and grows into an independent subject that contains a plethora of problems requiring thorough studies. The research field of software engineering thus has paid increasing attention to tackling these problems through data-driven investigation and automation, which aims to facilitate or automate the repetitive work in certain GUI development phases, including requirement, design, programming and testing. In this process, the intelligence to perceive and analyze GUI data and then perform adaptive behaviour is necessitated. In particular, the visual intelligence based on GUI image imitates how humans view the GUI and provides a more universal way for the GUI automation tools to achieve their goals without the dependency of the underlying system. However, due to the particular characteristics of GUI, many of the advanced techniques in other disciplines (e.g., computer vision, machine learning) are not suitable to be readily applied, while the GUI-specific visual approach still requires more research to handle complicated cases. Therefore, we introduce the GUI visual intelligence to address the challenges from GUI widget detection ("seeing") to GUI perceptual grouping ("understanding"), and we apply the intelligence to two classic GUI automation tasks ("acting"), including the image-to-code generation and automated testing. Furthermore, beyond the normal GUI, we also utilize the visual intelligence to build an online-form generator that converts the paper form or electronic form into an interactive online form with rich accessibility features to ease the filling process, which demonstrates the potential of the visual intelligence in wider fields.

Description

Keywords

Citation

Source

Type

Thesis (PhD)

Book Title

Entity type

Access Statement

License Rights

Restricted until

2024-01-12

Downloads

File
Description