Deploy the Full Stack Desktop Agent

Aymeric Roucher avatar

TL;DR: ScreenENV is a powerful Python library that allows you to create isolated Ubuntu desktop environments in Docker containers for testing and deploying GUI agents (aka Computer Use Agents). With built-in support for the Model Context Protocol (MCP), it’s easier to deploy a desktop agent that allows you to view, click and interact with real applications.

What is screenenv?

https://www.youtube.com/watch?v=vcutesrsj5a

Imagine you need to automate desktop tasks, test GUI applications, and build an AI agent that can interact with your software. This required a complex VM setup and a brittle automation framework.

ScreenENV changes this by providing a desktop environment with sandboxes that runs in Docker containers. Think of it as a complete virtual desktop session with full control over your code – not just clicking buttons and clicking text, but also manage the entire desktop experience, such as launching applications, organizing windows, processing files, running terminal commands, and recording the entire session.

Why Screen Enf?

Anute Method Full Desktop Control: Full Dual Integrated Modes of Mouse and Keyboard Automation, Window Management, Application Launch, File Manipulation, Terminal Access, and Screen Recording Modes: Model Context Protocol (MCP) for AI Systems and Direct Sandbox APIs – Agent or Backend Logic Native: Docker – Just Docker environments are isolated, reproducible, and can be easily deployed anywhere in less than 10 seconds. Supports AMD64 and ARM64 architectures.

🎯One-line setup

from screenenv Import Sandbox Sandbox = sandbox()

Two integrated approaches

ScreenENV offers two complementary ways to integrate with agents and backend systems, providing the flexibility to choose the best approach for your architecture.

Option 1: Direct Sandbox API

If you want a custom agent framework, an existing backend, or fine grain control:

from screenenv Import Sandbox Sandbox = Sandbox (Headless=error) sandbox.launch (“XFCE4 Terminal”) sandbox.write(“Echo “Custom Agent Logic””)screenshot = sandbox.screenshot()image = image.open(bytesio(screenshot_bytes))… sandbox.close()

Option 2: MCP Server Integration

Ideal for AI systems that support model context protocols.

from screenenv Import McPremoteserver
from MCP Import client
from McP.Client.Streamable_http Import streamablehttp_client server = mcpremoteserver(headless =error))
printing(f “MCP Server URL: {server.server_url}“))

async def MCP_SESSION():
async and streamablehttp_client(server.server_url) As stream:
async and Client (*stream) As session:
wait session.initialize()
printing(wait session.list_tools()) response = wait session.call_tool (“Screenshot”,{}) image_bytes = base64.b64decode(response.content(0).data)image = image.open(bytesio(image_bytes))server.close()

This dual approach means that ScreenENV adapts to existing infrastructure rather than forcing changes to the agent architecture.

Create a desktop agent using spenineNV and Smolagents

ScreenENV natively supports Smolagents, making it easy to build your own custom desktop agents for automation. Here’s how to create your own AI-powered desktop agent in just a few steps:

1. Select a model

Select the backend VLM that will power the agent.

Import OS

from Smoragents Import OpenaiserverModel= openaiservermodel(model_id=“GPT-4.1”,api_key = os.getenv(“openai_api_key”) )

from Smoragents Import hfapimodel model = hfapimodel(model_id =“Qwen/qwen2.5-vl-7b-instruct”token = os.getenv(“HF_TOKEN”), provider =“Nebius”,)

from Smoragents Import TransformerSmodel Model = TransformerSmodel(model_id =“Qwen/qwen2.5-vl-7b-instruct”device_map =“Automatic”torch_dtype =“Auto”trust_remote_code =truth,)

from Smoragents Import litellmodel model = litellmodel(model_id =“Humanity/claude-sonnet-4-20250514”))

2. Define a custom desktop agent

Inherit from desktoppagentbase and implement the _setup_desktop_tools method to build your own action space!

from screenenv Import desktoppagentbase, sandbox
from Smoragents Import Models, tools, tools
from Smolagents.Monitoring Import loglevel
from typing Import list

class customdesktopagent(desktoppagentbase):
“” “Desktop Automation Agent” “”

def __init__(
self,
Model: Model,
data_dir: str,
Desktop: Sandbox,
tool: list(Tool)| none = none,
max_steps: int = 200,
verbosity_level:loglevel =loglevel.info,
planning_interval: int | none = none,
use_v1_prompt: Boolean = error,
** kwargs,
):
wonderful().__ init __(model = model, data_dir = data_dir, desktop = desktop, tools = tools, max_steps = max_steps, verbositosity_level = verbosity_level, planning_interval = planning_interval, use_v1_prompt = use_v1_prompt, ** kwargs, ** kwargs,

def _SETUP_DESKTOP_TOOLS(self) -> none:
“” “Define your custom tool here.” “”

@tool
def click(X: int,y: int) -> str:
“” “
Click at the specified coordinates.
args:
X: X coordinate of click
Y: y coordinate of click
“” “
self.desktop.left_click(x, y)

return f “Clicked ({x}, {y}))

self.tools (“click”)=Click

@tool
def write(Text: str) -> str:
“” “
Enter the specified text at the current cursor position.
args:
Text: Text to enter
“” “
self.desktop.write(text, delay_in_ms =10))
return f “Type Text: ‘{Text}‘”

self.tools (“write”)=Write

@tool
def press(key: str) -> str:
“” “
Press the keyboard key or key combination
args:
Key: A key to press (such as “ENTER”, “space”, “backspace”) or multiple key strings such as “CTRL+A” or “Ctrl+Shift+A”.
“” “
self.desktop.press (key)
return f “Presky: {key}“

self.tools (“press”)=Press

@tool
def open(file_or_url: str) -> str:
“” “
Opens the browser directly using the specified URL and opens a file with the default application.
args:
file_or_url: file to open url or file
“” “

self.desktop.open(file_or_url)self.logger.log(f “Opening: {file_or_url}“))
return f “Open: {file_or_url}“

@tool
def Launch_app(app_name: str) -> str:
“” “
Starts the specified application.
args:
app_name: The name of the application to start
“” “
self.desktop.launch (app_name)
return F “Startup Application: {app_name}“

self.tools (“launch_app”)= launch_app …

3. Run the agent on the desktop task

from screenenv Import Sandbox Sandbox = Sandbox (Headless=errorresolution = (1920, 1080)) agent = customdesktopagent(model = model, data_dir =“data”,desktop = sandbox,)task = “Open Libreoffice and write a report of about 300 words on the topic “AI Agent Workflow for 2025” and save the document. ”

result = agent.run (task)
printing(f “📄Results: {result}“) sandbox.close()

If ACCES rejects the Docker error, you can try running the agent with sudo -e Python -M Test.py or adding the user to the Docker group.

For a comprehensive implementation, see this CustomDeskTopagent source on GitHub.

Let’s start today

PIP Install ScreenENV git clone git@github.com: huggingface/sceenenv.git
CD ScreenENV Python -m Examples.desktop_agent

What’s next?

ScreenENV extends beyond Linux to support Android, Macos and Windows, and aims to unlock true cross-platform GUI automation. This allows developers and researchers to build agents that generalize across their environments with minimal setup.

These advances open up ways to create reproducible sandboxed environments that are best for benchmarking and evaluation.

Repository: https://github.com/huggingface/screenenv

versatileai

See Full Bio

What's Hot

Byd, hkust, joint laboratory for research into embodied AI technology, intelligent manufacturing

Deploy the Full Stack Desktop Agent

Grok’s anti-Semitism explosion reflects the problem of AI chatbots

Google’s open Medgemma AI model could transform healthcare

Build a face MCP server to hug

Raise LLMS using Gradio MCP server

Leading the Korean LLM evaluation ecosystem

Introducing the Red Team Resistance Leaderboard

Will AI apps help carry the mental load of moms?

Most Popular

Leading the Korean LLM evaluation ecosystem

Introducing the Red Team Resistance Leaderboard

Will AI apps help carry the mental load of moms?

Don't Miss

Byd, hkust, joint laboratory for research into embodied AI technology, intelligent manufacturing

Deploy the Full Stack Desktop Agent

Grok’s anti-Semitism explosion reflects the problem of AI chatbots

Subscribe to Updates

What's Hot

Deploy the Full Stack Desktop Agent

What is screenenv?

Why Screen Enf?

🎯One-line setup

Two integrated approaches

Option 1: Direct Sandbox API

Option 2: MCP Server Integration

Create a desktop agent using spenineNV and Smolagents

1. Select a model

2. Define a custom desktop agent

3. Run the agent on the desktop task

Let’s start today

What’s next?

Related Posts