Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

Byd, hkust, joint laboratory for research into embodied AI technology, intelligent manufacturing

July 11, 2025

Deploy the Full Stack Desktop Agent

July 11, 2025

Grok’s anti-Semitism explosion reflects the problem of AI chatbots

July 11, 2025
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Friday, July 11
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
Versa AI hub
Home»Tools»Deploy the Full Stack Desktop Agent
Tools

Deploy the Full Stack Desktop Agent

versatileaiBy versatileaiJuly 11, 2025No Comments5 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
#image_title
Share
Facebook Twitter LinkedIn Pinterest Email



Aymeric Roucher avatar

TL;DR: ScreenENV is a powerful Python library that allows you to create isolated Ubuntu desktop environments in Docker containers for testing and deploying GUI agents (aka Computer Use Agents). With built-in support for the Model Context Protocol (MCP), it’s easier to deploy a desktop agent that allows you to view, click and interact with real applications.

What is screenenv?

https://www.youtube.com/watch?v=vcutesrsj5a

Imagine you need to automate desktop tasks, test GUI applications, and build an AI agent that can interact with your software. This required a complex VM setup and a brittle automation framework.

ScreenENV changes this by providing a desktop environment with sandboxes that runs in Docker containers. Think of it as a complete virtual desktop session with full control over your code – not just clicking buttons and clicking text, but also manage the entire desktop experience, such as launching applications, organizing windows, processing files, running terminal commands, and recording the entire session.

Why Screen Enf?

Anute Method Full Desktop Control: Full Dual Integrated Modes of Mouse and Keyboard Automation, Window Management, Application Launch, File Manipulation, Terminal Access, and Screen Recording Modes: Model Context Protocol (MCP) for AI Systems and Direct Sandbox APIs – Agent or Backend Logic Native: Docker – Just Docker environments are isolated, reproducible, and can be easily deployed anywhere in less than 10 seconds. Supports AMD64 and ARM64 architectures.

🎯One-line setup

from screenenv Import Sandbox Sandbox = sandbox()

Two integrated approaches

ScreenENV offers two complementary ways to integrate with agents and backend systems, providing the flexibility to choose the best approach for your architecture.

Option 1: Direct Sandbox API

If you want a custom agent framework, an existing backend, or fine grain control:

from screenenv Import Sandbox Sandbox = Sandbox (Headless=error) sandbox.launch (“XFCE4 Terminal”) sandbox.write(“Echo “Custom Agent Logic””)screenshot = sandbox.screenshot()image = image.open(bytesio(screenshot_bytes))… sandbox.close()

Option 2: MCP Server Integration

Ideal for AI systems that support model context protocols.

from screenenv Import McPremoteserver
from MCP Import client
from McP.Client.Streamable_http Import streamablehttp_client server = mcpremoteserver(headless =error))
printing(f “MCP Server URL: {server.server_url}“))

async def MCP_SESSION():
async and streamablehttp_client(server.server_url) As stream:
async and Client (*stream) As session:
wait session.initialize()
printing(wait session.list_tools()) response = wait session.call_tool (“Screenshot”,{}) image_bytes = base64.b64decode(response.content(0).data)image = image.open(bytesio(image_bytes))server.close()

This dual approach means that ScreenENV adapts to existing infrastructure rather than forcing changes to the agent architecture.

Create a desktop agent using spenineNV and Smolagents

ScreenENV natively supports Smolagents, making it easy to build your own custom desktop agents for automation. Here’s how to create your own AI-powered desktop agent in just a few steps:

1. Select a model

Select the backend VLM that will power the agent.

Import OS

from Smoragents Import OpenaiserverModel= openaiservermodel(model_id=“GPT-4.1”,api_key = os.getenv(“openai_api_key”) )

from Smoragents Import hfapimodel model = hfapimodel(model_id =“Qwen/qwen2.5-vl-7b-instruct”token = os.getenv(“HF_TOKEN”), provider =“Nebius”,)

from Smoragents Import TransformerSmodel Model = TransformerSmodel(model_id =“Qwen/qwen2.5-vl-7b-instruct”device_map =“Automatic”torch_dtype =“Auto”trust_remote_code =truth,)

from Smoragents Import litellmodel model = litellmodel(model_id =“Humanity/claude-sonnet-4-20250514”))

2. Define a custom desktop agent

Inherit from desktoppagentbase and implement the _setup_desktop_tools method to build your own action space!

from screenenv Import desktoppagentbase, sandbox
from Smoragents Import Models, tools, tools
from Smolagents.Monitoring Import loglevel
from typing Import list

class customdesktopagent(desktoppagentbase):
“” “Desktop Automation Agent” “”

def __init__(
self,
Model: Model,
data_dir: str,
Desktop: Sandbox,
tool: list(Tool)| none = none,
max_steps: int = 200,
verbosity_level:loglevel =loglevel.info,
planning_interval: int | none = none,
use_v1_prompt: Boolean = error,
** kwargs,
):
wonderful().__ init __(model = model, data_dir = data_dir, desktop = desktop, tools = tools, max_steps = max_steps, verbositosity_level = verbosity_level, planning_interval = planning_interval, use_v1_prompt = use_v1_prompt, ** kwargs, ** kwargs,

def _SETUP_DESKTOP_TOOLS(self) -> none:
“” “Define your custom tool here.” “”

@tool
def click(X: int,y: int) -> str:
“” “
Click at the specified coordinates.
args:
X: X coordinate of click
Y: y coordinate of click
“” “
self.desktop.left_click(x, y)

return f “Clicked ({x}, {y}))

self.tools (“click”)=Click

@tool
def write(Text: str) -> str:
“” “
Enter the specified text at the current cursor position.
args:
Text: Text to enter
“” “
self.desktop.write(text, delay_in_ms =10))
return f “Type Text: ‘{Text}‘”

self.tools (“write”)=Write

@tool
def press(key: str) -> str:
“” “
Press the keyboard key or key combination
args:
Key: A key to press (such as “ENTER”, “space”, “backspace”) or multiple key strings such as “CTRL+A” or “Ctrl+Shift+A”.
“” “
self.desktop.press (key)
return f “Presky: {key}“

self.tools (“press”)=Press

@tool
def open(file_or_url: str) -> str:
“” “
Opens the browser directly using the specified URL and opens a file with the default application.
args:
file_or_url: file to open url or file
“” “

self.desktop.open(file_or_url)self.logger.log(f “Opening: {file_or_url}“))
return f “Open: {file_or_url}“

@tool
def Launch_app(app_name: str) -> str:
“” “
Starts the specified application.
args:
app_name: The name of the application to start
“” “
self.desktop.launch (app_name)
return F “Startup Application: {app_name}“

self.tools (“launch_app”)= launch_app …

3. Run the agent on the desktop task

from screenenv Import Sandbox Sandbox = Sandbox (Headless=errorresolution = (1920, 1080)) agent = customdesktopagent(model = model, data_dir =“data”,desktop = sandbox,)task = “Open Libreoffice and write a report of about 300 words on the topic “AI Agent Workflow for 2025” and save the document. ”

result = agent.run (task)
printing(f “📄Results: {result}“) sandbox.close()

If ACCES rejects the Docker error, you can try running the agent with sudo -e Python -M Test.py or adding the user to the Docker group.

For a comprehensive implementation, see this CustomDeskTopagent source on GitHub.

Let’s start today

PIP Install ScreenENV git clone git@github.com: huggingface/sceenenv.git
CD ScreenENV Python -m Examples.desktop_agent

What’s next?

ScreenENV extends beyond Linux to support Android, Macos and Windows, and aims to unlock true cross-platform GUI automation. This allows developers and researchers to build agents that generalize across their environments with minimal setup.

These advances open up ways to create reproducible sandboxed environments that are best for benchmarking and evaluation.

Repository: https://github.com/huggingface/screenenv

author avatar
versatileai
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleGrok’s anti-Semitism explosion reflects the problem of AI chatbots
Next Article Byd, hkust, joint laboratory for research into embodied AI technology, intelligent manufacturing
versatileai

Related Posts

Tools

Google’s open Medgemma AI model could transform healthcare

July 11, 2025
Tools

Build a face MCP server to hug

July 10, 2025
Tools

Raise LLMS using Gradio MCP server

July 9, 2025
Add A Comment

Comments are closed.

Top Posts

Leading the Korean LLM evaluation ecosystem

July 8, 20251 Views

Introducing the Red Team Resistance Leaderboard

July 6, 20251 Views

Will AI apps help carry the mental load of moms?

May 8, 20251 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

Leading the Korean LLM evaluation ecosystem

July 8, 20251 Views

Introducing the Red Team Resistance Leaderboard

July 6, 20251 Views

Will AI apps help carry the mental load of moms?

May 8, 20251 Views
Don't Miss

Byd, hkust, joint laboratory for research into embodied AI technology, intelligent manufacturing

July 11, 2025

Deploy the Full Stack Desktop Agent

July 11, 2025

Grok’s anti-Semitism explosion reflects the problem of AI chatbots

July 11, 2025
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2025 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?