On this tutorial, we stroll by means of the method of constructing a complicated AI desktop automation agent that runs seamlessly in Google Colab. We design it to interpret pure language instructions, simulate desktop duties resembling file operations, browser actions, and workflows, and supply interactive suggestions by means of a digital surroundings. By combining NLP, activity execution, and a simulated desktop, we create a system that feels each intuitive and highly effective, permitting us to expertise automation ideas with out counting on exterior APIs. Try the FULL CODES right here.
import re
import json
import time
import random
import threading
from datetime import datetime
from typing import Dict, Record, Any, Tuple
from dataclasses import dataclass, asdict
from enum import Enum
attempt:
from IPython.show import show, HTML, clear_output
import matplotlib.pyplot as plt
import numpy as np
COLAB_MODE = True
besides ImportError:
COLAB_MODE = False
We start by importing important Python libraries that help knowledge dealing with, visualization, and simulation. We arrange Colab-specific instruments to run the tutorial interactively in a seamless surroundings. Try the FULL CODES right here.
class TaskType(Enum):
FILE_OPERATION = "file_operation"
BROWSER_ACTION = "browser_action"
SYSTEM_COMMAND = "system_command"
APPLICATION_TASK = "application_task"
WORKFLOW = "workflow"
@dataclass
class Process:
id: str
sort: TaskType
command: str
standing: str = "pending"
consequence: str = ""
timestamp: str = ""
execution_time: float = 0.0
We outline the construction of our automation system. We create an enum to categorize activity varieties and a Process dataclass that helps us observe every command with its particulars, standing, and execution outcomes. Try the FULL CODES right here.
class VirtualDesktop:
"""Simulates a desktop surroundings with purposes and file system"""
def __init__(self):
self.purposes = {
"browser": {"standing": "closed", "tabs": [], "current_url": ""},
"file_manager": {"standing": "closed", "current_path": "/residence/consumer"},
"text_editor": {"standing": "closed", "current_file": "", "content material": ""},
"electronic mail": {"standing": "closed", "unread": 3, "inbox": []},
"terminal": {"standing": "closed", "historical past": []}
}
self.file_system = {
"/residence/consumer/": {
"paperwork/": {
"report.txt": "Necessary quarterly report content material...",
"notes.md": "# Assembly Notesn- Challenge updaten- Price range overview"
},
"downloads/": {
"knowledge.csv": "title,age,citynJohn,25,NYCnJane,30,LA",
"picture.jpg": "[Binary image data]"
},
"desktop/": {}
}
}
self.screen_state = {
"active_window": None,
"mouse_position": (0, 0),
"clipboard": ""
}
def get_system_info(self) -> Dict:
return {
"cpu_usage": random.randint(5, 25),
"memory_usage": random.randint(30, 60),
"disk_space": random.randint(60, 90),
"network_status": "related",
"uptime": "2 hours quarter-hour"
}
class NLPProcessor:
"""Processes pure language instructions and extracts intents"""
def __init__(self):
self.intent_patterns = show)s+(system
def extract_intent(self, command: str) -> Tuple[TaskType, float]:
"""Extract activity sort and confidence from pure language command"""
command_lower = command.decrease()
best_match = TaskType.SYSTEM_COMMAND
best_confidence = 0.0
for task_type, patterns in self.intent_patterns.objects():
for sample in patterns:
if re.search(sample, command_lower):
confidence = len(re.findall(sample, command_lower)) * 0.3
if confidence > best_confidence:
best_match = task_type
best_confidence = confidence
return best_match, min(best_confidence, 1.0)
def extract_parameters(self, command: str, task_type: TaskType) -> Dict[str, str]:
"""Extract parameters from command primarily based on activity sort"""
params = {}
command_lower = command.decrease()
if task_type == TaskType.FILE_OPERATION:
file_match = re.search(r'[w/.-]+.w+', command)
if file_match:
params['filename'] = file_match.group()
path_match = re.search(r'/[w/.-]+', command)
if path_match:
params['path'] = path_match.group()
elif task_type == TaskType.BROWSER_ACTION:
url_match = re.search(r'https?://[w.-]+|[w.-]+.(com|org|web|edu)', command)
if url_match:
params['url'] = url_match.group()
search_match = re.search(r'(?:search|discover|google)s+["']?([^"']+)["']?', command_lower)
if search_match:
params['query'] = search_match.group(1)
elif task_type == TaskType.APPLICATION_TASK:
app_match = re.search(r'(browser|editor|electronic mail|terminal|calculator)', command_lower)
if app_match:
params['application'] = app_match.group(1)
return params
We simulate a digital desktop with purposes, a file system, and system states whereas additionally constructing an NLP processor. We set up guidelines to determine consumer intents from pure language instructions and extract helpful parameters, resembling filenames, URLs, or utility names. This enables us to bridge pure language enter with structured automation duties. Try the FULL CODES right here.
class TaskExecutor:
"""Executes duties on the digital desktop"""
def __init__(self, desktop: VirtualDesktop):
self.desktop = desktop
self.execution_log = []
def execute_file_operation(self, params: Dict[str, str], command: str) -> str:
"""Simulate file operations"""
if "open" in command.decrease():
filename = params.get('filename', 'unknown.txt')
return f"✓ Opened file: {filename}n📁 File contents loaded in textual content editor"
elif "create" in command.decrease():
filename = params.get('filename', 'new_file.txt')
return f"✓ Created new file: {filename}n📝 File prepared for modifying"
elif "checklist" in command.decrease():
recordsdata = checklist(self.desktop.file_system["/home/user/documents/"].keys())
return f"📂 Information discovered:n" + "n".be part of([f" • {f}" for f in files])
return "✓ File operation accomplished efficiently"
def execute_browser_action(self, params: Dict[str, str], command: str) -> str:
"""Simulate browser actions"""
if "open" in command.decrease() or "go to" in command.decrease():
url = params.get('url', 'instance.com')
self.desktop.purposes["browser"]["current_url"] = url
self.desktop.purposes["browser"]["status"] = "open"
return f"🌐 Navigated to: {url}n✓ Web page loaded efficiently"
elif "search" in command.decrease():
question = params.get('question', 'search time period')
return f"🔍 Looking for: '{question}'n✓ Discovered 1,247 outcomes"
return "✓ Browser motion accomplished"
def execute_system_command(self, params: Dict[str, str], command: str) -> str:
"""Simulate system instructions"""
if "test" in command.decrease() or "present" in command.decrease():
information = self.desktop.get_system_info()
return f"💻 System Standing:n" +
f" CPU: {information['cpu_usage']}%n" +
f" Reminiscence: {information['memory_usage']}%n" +
f" Disk: {information['disk_space']}% usedn" +
f" Community: {information['network_status']}"
return "✓ System command executed"
def execute_application_task(self, params: Dict[str, str], command: str) -> str:
"""Simulate utility duties"""
app = params.get('utility', 'unknown')
if "open" in command.decrease():
self.desktop.purposes[app]["status"] = "open"
return f"🚀 Launched {app.title()}n✓ Software prepared to be used"
elif "shut" in command.decrease():
if app in self.desktop.purposes:
self.desktop.purposes[app]["status"] = "closed"
return f"❌ Closed {app.title()}"
return f"✓ {app.title()} activity accomplished"
def execute_workflow(self, params: Dict[str, str], command: str) -> str:
"""Simulate complicated workflow execution"""
steps = [
"Analyzing workflow requirements...",
"Preparing automation steps...",
"Executing batch operations...",
"Validating results...",
"Generating report..."
]
consequence = "🔄 Workflow Execution:n"
for i, step in enumerate(steps, 1):
consequence += f" {i}. {step} ✓n"
if COLAB_MODE:
time.sleep(0.1)
return consequence + "✅ Workflow accomplished efficiently!"
class DesktopAgent:
"""Foremost desktop automation agent class - coordinates all elements"""
def __init__(self):
self.desktop = VirtualDesktop()
self.nlp = NLPProcessor()
self.executor = TaskExecutor(self.desktop)
self.task_history = []
self.energetic = True
self.stats = {
"tasks_completed": 0,
"success_rate": 100.0,
"average_execution_time": 0.0
}
def process_command(self, command: str) -> Process:
"""Course of a pure language command and execute it"""
start_time = time.time()
task_id = f"task_{len(self.task_history) + 1:04d}"
task_type, confidence = self.nlp.extract_intent(command)
activity = Process(
id=task_id,
sort=task_type,
command=command,
timestamp=datetime.now().strftime("%H:%M:%S")
)
attempt:
params = self.nlp.extract_parameters(command, task_type)
if task_type == TaskType.FILE_OPERATION:
consequence = self.executor.execute_file_operation(params, command)
elif task_type == TaskType.BROWSER_ACTION:
consequence = self.executor.execute_browser_action(params, command)
elif task_type == TaskType.SYSTEM_COMMAND:
consequence = self.executor.execute_system_command(params, command)
elif task_type == TaskType.APPLICATION_TASK:
consequence = self.executor.execute_application_task(params, command)
elif task_type == TaskType.WORKFLOW:
consequence = self.executor.execute_workflow(params, command)
else:
consequence = "⚠️ Command sort not acknowledged"
activity.standing = "accomplished"
activity.consequence = consequence
self.stats["tasks_completed"] += 1
besides Exception as e:
activity.standing = "failed"
activity.consequence = f"❌ Error: {str(e)}"
activity.execution_time = spherical(time.time() - start_time, 3)
self.task_history.append(activity)
self.update_stats()
return activity
def update_stats(self):
"""Replace agent statistics"""
if self.task_history:
successful_tasks = sum(1 for t in self.task_history if t.standing == "accomplished")
self.stats["success_rate"] = spherical((successful_tasks / len(self.task_history)) * 100, 1)
total_time = sum(t.execution_time for t in self.task_history)
self.stats["average_execution_time"] = spherical(total_time / len(self.task_history), 3)
def get_status_dashboard(self) -> str:
"""Generate a standing dashboard"""
recent_tasks = self.task_history[-5:] if self.task_history else []
dashboard = f"""
╭━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╮
│ 🤖 AI DESKTOP AGENT STATUS │
├──────────────────────────────────────────────────────┤
│ 📊 Statistics: │
│ • Duties Accomplished: {self.stats['tasks_completed']:<10} │
│ • Success Charge: {self.stats['success_rate']:<10}% │
│ • Avg Exec Time: {self.stats['average_execution_time']:<10}s │
├──────────────────────────────────────────────────────┤
│ 🖥️ Desktop Functions: │
"""
for app, information in self.desktop.purposes.objects():
status_icon = "🟢" if information["status"] == "open" else "🔴"
dashboard += f"│ {status_icon} {app.title():<12} ({information['status']:<6}) │n"
dashboard += "├──────────────────────────────────────────────────────┤n"
dashboard += "│ 📋 Latest Duties: │n"
if recent_tasks:
for activity in recent_tasks:
status_icon = "✅" if activity.standing == "accomplished" else "❌"
dashboard += f"│ {status_icon} {activity.timestamp} - {activity.sort.worth:<15} │n"
else:
dashboard += "│ No duties executed but │n"
dashboard += "╰━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╯"
return dashboard
We implement the executor that turns our parsed intents into concrete actions and reasonable outputs on the digital desktop. We then wire the whole lot collectively within the DesktopAgent, the place we course of pure language, execute duties, and constantly observe success, latency, and a reside standing dashboard. Try the FULL CODES right here.
def run_advanced_demo():
"""Run a complicated interactive demo of the AI Desktop Agent"""
print("🚀 Initializing Superior AI Desktop Automation Agent...")
time.sleep(1)
agent = DesktopAgent()
print("n" + "="*60)
print("🤖 AI DESKTOP AUTOMATION AGENT - ADVANCED TUTORIAL")
print("="*60)
print("A complicated AI agent that understands pure language")
print("instructions and automates desktop duties in a simulated surroundings.")
print("n💡 Attempt these instance instructions:")
print(" • 'open the browser and go to github.com'")
print(" • 'create a brand new file known as report.txt'")
print(" • 'test system efficiency'")
print(" • 'present me the recordsdata in paperwork folder'")
print(" • 'automate electronic mail processing workflow'")
demo_commands = [
"check system status and show CPU usage",
"open browser and navigate to github.com",
"create a new file called meeting_notes.txt",
"list all files in the documents directory",
"launch text editor application",
"automate data backup workflow"
]
print(f"n🎯 Operating {len(demo_commands)} demonstration instructions...n")
for i, command in enumerate(demo_commands, 1):
print(f"[{i}/{len(demo_commands)}] Command: '{command}'")
print("-" * 50)
activity = agent.process_command(command)
print(f"Process ID: {activity.id}")
print(f"Sort: {activity.sort.worth}")
print(f"Standing: {activity.standing}")
print(f"Execution Time: {activity.execution_time}s")
print(f"Outcome:n{activity.consequence}")
print()
if COLAB_MODE:
time.sleep(0.5)
print("n" + "="*60)
print("📊 FINAL AGENT STATUS")
print("="*60)
print(agent.get_status_dashboard())
return agent
def interactive_mode(agent):
"""Run interactive mode for consumer enter"""
print("n🎮 INTERACTIVE MODE ACTIVATED")
print("Sort your instructions under (sort 'stop' to exit, 'standing' for dashboard):")
print("-" * 60)
whereas True:
attempt:
user_input = enter("n🤖 Agent> ").strip()
if user_input.decrease() in ['quit', 'exit', 'q']:
print("👋 AI Agent shutting down. Goodbye!")
break
elif user_input.decrease() in ['status', 'dashboard']:
print(agent.get_status_dashboard())
proceed
elif user_input.decrease() in ['help', '?']:
print("📚 Obtainable instructions:")
print(" • Any pure language command")
print(" • 'standing' - Present agent dashboard")
print(" • 'assist' - Present this assist")
print(" • 'stop' - Exit AI Agent")
proceed
elif not user_input:
proceed
print(f"Processing: '{user_input}'...")
activity = agent.process_command(user_input)
print(f"n✨ Process {activity.id} [{task.type.value}] - {activity.standing}")
print(activity.consequence)
besides KeyboardInterrupt:
print("nn👋 AI Agent interrupted. Goodbye!")
break
besides Exception as e:
print(f"❌ Error: {e}")
if __name__ == "__main__":
agent = run_advanced_demo()
if COLAB_MODE:
print("n🎮 To proceed with interactive mode, run:")
print("interactive_mode(agent)")
else:
interactive_mode(agent)
We run a scripted demo that processes reasonable instructions, prints outcomes, and finishes with a reside standing dashboard. We then present an interactive loop the place we sort pure language duties, test the standing, and obtain fast suggestions. Lastly, we auto-start the demo and, in Colab, we present methods to launch interactive mode with a single name.
In conclusion, we reveal how an AI agent can deal with all kinds of desktop-like duties in a simulated surroundings utilizing solely Python. We see how pure language inputs are translated into structured duties, executed with reasonable outputs, and summarized in a visible dashboard. With this basis, we place ourselves to increase the agent with extra complicated behaviors, richer interfaces, and real-world integrations, making desktop automation smarter, extra interactive, and simpler to make use of.
Try the FULL CODES right here. Be happy to take a look at our GitHub Web page for Tutorials, Codes and Notebooks. Additionally, be at liberty to comply with us on Twitter and don’t neglect to hitch our 100k+ ML SubReddit and Subscribe to our E-newsletter.
Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its recognition amongst audiences.