TL;DR: ScreenENV is a powerful Python library that allows you to create isolated Ubuntu desktop environments in Docker containers for testing and deploying GUI agents (aka Computer Use Agents). With built-in support for the Model Context Protocol (MCP), it’s easier to deploy a desktop agent that allows you to view, click and interact with real applications.
What is screenenv?
https://www.youtube.com/watch?v=vcutesrsj5a
Imagine you need to automate desktop tasks, test GUI applications, and build an AI agent that can interact with your software. This required a complex VM setup and a brittle automation framework.
ScreenENV changes this by providing a desktop environment with sandboxes that runs in Docker containers. Think of it as a complete virtual desktop session with full control over your code – not just clicking buttons and clicking text, but also manage the entire desktop experience, such as launching applications, organizing windows, processing files, running terminal commands, and recording the entire session.
Why Screen Enf?
Anute Method Full Desktop Control: Full Dual Integrated Modes of Mouse and Keyboard Automation, Window Management, Application Launch, File Manipulation, Terminal Access, and Screen Recording Modes: Model Context Protocol (MCP) for AI Systems and Direct Sandbox APIs – Agent or Backend Logic Native: Docker – Just Docker environments are isolated, reproducible, and can be easily deployed anywhere in less than 10 seconds. Supports AMD64 and ARM64 architectures.
🎯One-line setup
from screenenv Import Sandbox Sandbox = sandbox()
Two integrated approaches
ScreenENV offers two complementary ways to integrate with agents and backend systems, providing the flexibility to choose the best approach for your architecture.
Option 1: Direct Sandbox API
If you want a custom agent framework, an existing backend, or fine grain control:
from screenenv Import Sandbox Sandbox = Sandbox (Headless=error) sandbox.launch (“XFCE4 Terminal”) sandbox.write(“Echo “Custom Agent Logic””)screenshot = sandbox.screenshot()image = image.open(bytesio(screenshot_bytes))… sandbox.close()
Option 2: MCP Server Integration
Ideal for AI systems that support model context protocols.
from screenenv Import McPremoteserver
from MCP Import client
from McP.Client.Streamable_http Import streamablehttp_client server = mcpremoteserver(headless =error))
printing(f “MCP Server URL: {server.server_url}“))
async def MCP_SESSION():
async and streamablehttp_client(server.server_url) As stream:
async and Client (*stream) As session:
wait session.initialize()
printing(wait session.list_tools()) response = wait session.call_tool (“Screenshot”,{}) image_bytes = base64.b64decode(response.content(0).data)image = image.open(bytesio(image_bytes))server.close()
This dual approach means that ScreenENV adapts to existing infrastructure rather than forcing changes to the agent architecture.
Create a desktop agent using spenineNV and Smolagents
ScreenENV natively supports Smolagents, making it easy to build your own custom desktop agents for automation. Here’s how to create your own AI-powered desktop agent in just a few steps:
1. Select a model
Select the backend VLM that will power the agent.
Import OS
from Smoragents Import OpenaiserverModel= openaiservermodel(model_id=“GPT-4.1”,api_key = os.getenv(“openai_api_key”) )
from Smoragents Import hfapimodel model = hfapimodel(model_id =“Qwen/qwen2.5-vl-7b-instruct”token = os.getenv(“HF_TOKEN”), provider =“Nebius”,)
from Smoragents Import TransformerSmodel Model = TransformerSmodel(model_id =“Qwen/qwen2.5-vl-7b-instruct”device_map =“Automatic”torch_dtype =“Auto”trust_remote_code =truth,)
from Smoragents Import litellmodel model = litellmodel(model_id =“Humanity/claude-sonnet-4-20250514”))
2. Define a custom desktop agent
Inherit from desktoppagentbase and implement the _setup_desktop_tools method to build your own action space!
from screenenv Import desktoppagentbase, sandbox
from Smoragents Import Models, tools, tools
from Smolagents.Monitoring Import loglevel
from typing Import list
class customdesktopagent(desktoppagentbase):
“” “Desktop Automation Agent” “”
def __init__(
self,
Model: Model,
data_dir: str,
Desktop: Sandbox,
tool: list(Tool)| none = none,
max_steps: int = 200,
verbosity_level:loglevel =loglevel.info,
planning_interval: int | none = none,
use_v1_prompt: Boolean = error,
** kwargs,
):
wonderful().__ init __(model = model, data_dir = data_dir, desktop = desktop, tools = tools, max_steps = max_steps, verbositosity_level = verbosity_level, planning_interval = planning_interval, use_v1_prompt = use_v1_prompt, ** kwargs, ** kwargs,
def _SETUP_DESKTOP_TOOLS(self) -> none:
“” “Define your custom tool here.” “”
def click(X: int,y: int) -> str:
“” “
Click at the specified coordinates.
args:
X: X coordinate of click
Y: y coordinate of click
“” “
self.desktop.left_click(x, y)
return f “Clicked ({x}, {y}))
self.tools (“click”)=Click
def write(Text: str) -> str:
“” “
Enter the specified text at the current cursor position.
args:
Text: Text to enter
“” “
self.desktop.write(text, delay_in_ms =10))
return f “Type Text: ‘{Text}‘”
self.tools (“write”)=Write
def press(key: str) -> str:
“” “
Press the keyboard key or key combination
args:
Key: A key to press (such as “ENTER”, “space”, “backspace”) or multiple key strings such as “CTRL+A” or “Ctrl+Shift+A”.
“” “
self.desktop.press (key)
return f “Presky: {key}“
self.tools (“press”)=Press
def open(file_or_url: str) -> str:
“” “
Opens the browser directly using the specified URL and opens a file with the default application.
args:
file_or_url: file to open url or file
“” “
self.desktop.open(file_or_url)self.logger.log(f “Opening: {file_or_url}“))
return f “Open: {file_or_url}“
def Launch_app(app_name: str) -> str:
“” “
Starts the specified application.
args:
app_name: The name of the application to start
“” “
self.desktop.launch (app_name)
return F “Startup Application: {app_name}“
self.tools (“launch_app”)= launch_app …
3. Run the agent on the desktop task
from screenenv Import Sandbox Sandbox = Sandbox (Headless=errorresolution = (1920, 1080)) agent = customdesktopagent(model = model, data_dir =“data”,desktop = sandbox,)task = “Open Libreoffice and write a report of about 300 words on the topic “AI Agent Workflow for 2025” and save the document. ”
result = agent.run (task)
printing(f “📄Results: {result}“) sandbox.close()
If ACCES rejects the Docker error, you can try running the agent with sudo -e Python -M Test.py or adding the user to the Docker group.
For a comprehensive implementation, see this CustomDeskTopagent source on GitHub.
Let’s start today
PIP Install ScreenENV git clone git@github.com: huggingface/sceenenv.git
CD ScreenENV Python -m Examples.desktop_agent
What’s next?
ScreenENV extends beyond Linux to support Android, Macos and Windows, and aims to unlock true cross-platform GUI automation. This allows developers and researchers to build agents that generalize across their environments with minimal setup.
These advances open up ways to create reproducible sandboxed environments that are best for benchmarking and evaluation.
Repository: https://github.com/huggingface/screenenv