w95/tinyclick

TinyClick: Single-Turn Agent for Empowering GUI Automation

Public
28 runs

Run time and cost

This model runs on Nvidia L40S GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

TinyClick: Single-Turn Agent for Empowering GUI Automation

The code for running the model from paper: TinyClick: Single-Turn Agent for Empowering GUI Automation

About The Project

We present a single-turn agent for graphical user interface (GUI) interaction tasks, using Vision-Language Model Florence-2-Base. Main goal of the agent is to click on desired UI element based on the screenshot and user command. It demonstrates strong performance on Screenspot and OmniAct, while maintaining a compact size of 0.27B parameters and minimal latency.

Installation

Before running, set up the environment and install the required packages:

pip install -r requirements.txt

Usage

To see example inference with TinyClick, run this command:
python3 main.py --image-path "<PATH>" --text "<COMMAND>"

Citation

@misc{pawlowski2024tinyclicksingleturnagentempowering,
    title={TinyClick: Single-Turn Agent for Empowering GUI Automation}, 
    author={Pawel Pawlowski and Krystian Zawistowski and Wojciech Lapacz and Marcin Skorupa and Adam Wiacek and Sebastien Postansque and Jakub Hoscilowicz},
    year={2024},
    eprint={2410.11871},
    archivePrefix={arXiv},
    primaryClass={cs.HC},
    url={https://arxiv.org/abs/2410.11871}, 
}

License

Please check the MIT license that is listed in this repository. See LICENSE for more information.