Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

Aprilel-1.6-15b-Thinker: Cost-effective frontier multimodal performance

December 11, 2025

Gemini 3 for developers: new inference, agent features

December 10, 2025

Anifun vs NovelAI: Which anime AI art generator is better for story creation?

December 10, 2025
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Thursday, December 11
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources
Versa AI hub
Home»Tools»Deploy the Full Stack Desktop Agent
Tools

Deploy the Full Stack Desktop Agent

versatileaiBy versatileaiJuly 11, 2025No Comments5 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
#image_title
Share
Facebook Twitter LinkedIn Pinterest Email



Aymeric Roucher avatar

TL;DR: ScreenENV is a powerful Python library that allows you to create isolated Ubuntu desktop environments in Docker containers for testing and deploying GUI agents (aka Computer Use Agents). With built-in support for the Model Context Protocol (MCP), it’s easier to deploy a desktop agent that allows you to view, click and interact with real applications.

What is screenenv?

https://www.youtube.com/watch?v=vcutesrsj5a

Imagine you need to automate desktop tasks, test GUI applications, and build an AI agent that can interact with your software. This required a complex VM setup and a brittle automation framework.

ScreenENV changes this by providing a desktop environment with sandboxes that runs in Docker containers. Think of it as a complete virtual desktop session with full control over your code – not just clicking buttons and clicking text, but also manage the entire desktop experience, such as launching applications, organizing windows, processing files, running terminal commands, and recording the entire session.

Why Screen Enf?

Anute Method Full Desktop Control: Full Dual Integrated Modes of Mouse and Keyboard Automation, Window Management, Application Launch, File Manipulation, Terminal Access, and Screen Recording Modes: Model Context Protocol (MCP) for AI Systems and Direct Sandbox APIs – Agent or Backend Logic Native: Docker – Just Docker environments are isolated, reproducible, and can be easily deployed anywhere in less than 10 seconds. Supports AMD64 and ARM64 architectures.

🎯One-line setup

from screenenv Import Sandbox Sandbox = sandbox()

Two integrated approaches

ScreenENV offers two complementary ways to integrate with agents and backend systems, providing the flexibility to choose the best approach for your architecture.

Option 1: Direct Sandbox API

If you want a custom agent framework, an existing backend, or fine grain control:

from screenenv Import Sandbox Sandbox = Sandbox (Headless=error) sandbox.launch (“XFCE4 Terminal”) sandbox.write(“Echo “Custom Agent Logic””)screenshot = sandbox.screenshot()image = image.open(bytesio(screenshot_bytes))… sandbox.close()

Option 2: MCP Server Integration

Ideal for AI systems that support model context protocols.

from screenenv Import McPremoteserver
from MCP Import client
from McP.Client.Streamable_http Import streamablehttp_client server = mcpremoteserver(headless =error))
printing(f “MCP Server URL: {server.server_url}“))

async def MCP_SESSION():
async and streamablehttp_client(server.server_url) As stream:
async and Client (*stream) As session:
wait session.initialize()
printing(wait session.list_tools()) response = wait session.call_tool (“Screenshot”,{}) image_bytes = base64.b64decode(response.content(0).data)image = image.open(bytesio(image_bytes))server.close()

This dual approach means that ScreenENV adapts to existing infrastructure rather than forcing changes to the agent architecture.

Create a desktop agent using spenineNV and Smolagents

ScreenENV natively supports Smolagents, making it easy to build your own custom desktop agents for automation. Here’s how to create your own AI-powered desktop agent in just a few steps:

1. Select a model

Select the backend VLM that will power the agent.

Import OS

from Smoragents Import OpenaiserverModel= openaiservermodel(model_id=“GPT-4.1”,api_key = os.getenv(“openai_api_key”) )

from Smoragents Import hfapimodel model = hfapimodel(model_id =“Qwen/qwen2.5-vl-7b-instruct”token = os.getenv(“HF_TOKEN”), provider =“Nebius”,)

from Smoragents Import TransformerSmodel Model = TransformerSmodel(model_id =“Qwen/qwen2.5-vl-7b-instruct”device_map =“Automatic”torch_dtype =“Auto”trust_remote_code =truth,)

from Smoragents Import litellmodel model = litellmodel(model_id =“Humanity/claude-sonnet-4-20250514”))

2. Define a custom desktop agent

Inherit from desktoppagentbase and implement the _setup_desktop_tools method to build your own action space!

from screenenv Import desktoppagentbase, sandbox
from Smoragents Import Models, tools, tools
from Smolagents.Monitoring Import loglevel
from typing Import list

class customdesktopagent(desktoppagentbase):
“” “Desktop Automation Agent” “”

def __init__(
self,
Model: Model,
data_dir: str,
Desktop: Sandbox,
tool: list(Tool)| none = none,
max_steps: int = 200,
verbosity_level:loglevel =loglevel.info,
planning_interval: int | none = none,
use_v1_prompt: Boolean = error,
** kwargs,
):
wonderful().__ init __(model = model, data_dir = data_dir, desktop = desktop, tools = tools, max_steps = max_steps, verbositosity_level = verbosity_level, planning_interval = planning_interval, use_v1_prompt = use_v1_prompt, ** kwargs, ** kwargs,

def _SETUP_DESKTOP_TOOLS(self) -> none:
“” “Define your custom tool here.” “”

@tool
def click(X: int,y: int) -> str:
“” “
Click at the specified coordinates.
args:
X: X coordinate of click
Y: y coordinate of click
“” “
self.desktop.left_click(x, y)

return f “Clicked ({x}, {y}))

self.tools (“click”)=Click

@tool
def write(Text: str) -> str:
“” “
Enter the specified text at the current cursor position.
args:
Text: Text to enter
“” “
self.desktop.write(text, delay_in_ms =10))
return f “Type Text: ‘{Text}‘”

self.tools (“write”)=Write

@tool
def press(key: str) -> str:
“” “
Press the keyboard key or key combination
args:
Key: A key to press (such as “ENTER”, “space”, “backspace”) or multiple key strings such as “CTRL+A” or “Ctrl+Shift+A”.
“” “
self.desktop.press (key)
return f “Presky: {key}“

self.tools (“press”)=Press

@tool
def open(file_or_url: str) -> str:
“” “
Opens the browser directly using the specified URL and opens a file with the default application.
args:
file_or_url: file to open url or file
“” “

self.desktop.open(file_or_url)self.logger.log(f “Opening: {file_or_url}“))
return f “Open: {file_or_url}“

@tool
def Launch_app(app_name: str) -> str:
“” “
Starts the specified application.
args:
app_name: The name of the application to start
“” “
self.desktop.launch (app_name)
return F “Startup Application: {app_name}“

self.tools (“launch_app”)= launch_app …

3. Run the agent on the desktop task

from screenenv Import Sandbox Sandbox = Sandbox (Headless=errorresolution = (1920, 1080)) agent = customdesktopagent(model = model, data_dir =“data”,desktop = sandbox,)task = “Open Libreoffice and write a report of about 300 words on the topic “AI Agent Workflow for 2025” and save the document. ”

result = agent.run (task)
printing(f “📄Results: {result}“) sandbox.close()

If ACCES rejects the Docker error, you can try running the agent with sudo -e Python -M Test.py or adding the user to the Docker group.

For a comprehensive implementation, see this CustomDeskTopagent source on GitHub.

Let’s start today

PIP Install ScreenENV git clone git@github.com: huggingface/sceenenv.git
CD ScreenENV Python -m Examples.desktop_agent

What’s next?

ScreenENV extends beyond Linux to support Android, Macos and Windows, and aims to unlock true cross-platform GUI automation. This allows developers and researchers to build agents that generalize across their environments with minimal setup.

These advances open up ways to create reproducible sandboxed environments that are best for benchmarking and evaluation.

Repository: https://github.com/huggingface/screenenv

author avatar
versatileai
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleGrok’s anti-Semitism explosion reflects the problem of AI chatbots
Next Article Byd, hkust, joint laboratory for research into embodied AI technology, intelligent manufacturing
versatileai

Related Posts

Tools

Aprilel-1.6-15b-Thinker: Cost-effective frontier multimodal performance

December 11, 2025
Tools

Gemini 3 for developers: new inference, agent features

December 10, 2025
Tools

Accenture and Anthropic partner to power enterprise AI integration

December 10, 2025
Add A Comment

Comments are closed.

Top Posts

New image verification feature added to Gemini app

December 7, 20256 Views

Aluminum OS is the AI-powered successor to ChromeOS

December 7, 20255 Views

UK and Germany plan to commercialize quantum supercomputing

December 5, 20255 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

New image verification feature added to Gemini app

December 7, 20256 Views

Aluminum OS is the AI-powered successor to ChromeOS

December 7, 20255 Views

UK and Germany plan to commercialize quantum supercomputing

December 5, 20255 Views
Don't Miss

Aprilel-1.6-15b-Thinker: Cost-effective frontier multimodal performance

December 11, 2025

Gemini 3 for developers: new inference, agent features

December 10, 2025

Anifun vs NovelAI: Which anime AI art generator is better for story creation?

December 10, 2025
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2025 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?