On this tutorial, we are going to discover ways to harness the facility of a browser‑pushed AI agent completely inside Google Colab. We are going to make the most of Playwright’s headless Chromium engine, together with the browser_use library’s high-level Agent and BrowserContext abstractions, to programmatically navigate web sites, extract information, and automate complicated workflows. We are going to wrap Google’s Gemini mannequin through the langchain_google_genai connector to offer pure‑language reasoning and resolution‑making, secured by pydantic’s SecretStr for secure API‑key dealing with. With getpass managing credentials, asyncio orchestrating non‑blocking execution, and non-compulsory .env help through python-dotenv, this setup offers you an finish‑to‑finish, interactive agent platform with out ever leaving your pocket book setting.
!apt-get replace -qq
!apt-get set up -y -qq chromium-browser chromium-chromedriver fonts-liberation
!pip set up -qq playwright python-dotenv langchain-google-generative-ai browser-use
!playwright set up
We first refresh the system package deal lists and set up headless Chromium, its WebDriver, and the Liberation fonts to allow browser automation. It then installs Playwright together with python-dotenv, the LangChain GoogleGenerativeAI connector, and browser-use, and eventually downloads the mandatory browser binaries through playwright set up.
import os
import asyncio
from getpass import getpass
from pydantic import SecretStr
from langchain_google_genai import ChatGoogleGenerativeAI
from browser_use import Agent, Browser, BrowserContextConfig, BrowserConfig
from browser_use.browser.browser import BrowserContext
We deliver within the core Python utilities, os for setting administration and asyncio for asynchronous execution, plus getpass and pydantic’s SecretStr for safe API‑key enter and storage. It then hundreds LangChain’s Gemini wrapper (ChatGoogleGenerativeAI) and the browser_use toolkit (Agent, Browser, BrowserContextConfig, BrowserConfig, and BrowserContext) to configure and drive a headless browser agent.
os.environ["ANONYMIZED_TELEMETRY"] = "false"
We disable nameless utilization reporting by setting the ANONYMIZED_TELEMETRY setting variable to “false”, guaranteeing that neither Playwright nor the browser_use library sends any telemetry information again to its maintainers.
async def setup_browser(headless: bool = True):
browser = Browser(config=BrowserConfig(headless=headless))
context = BrowserContext(
browser=browser,
config=BrowserContextConfig(
wait_for_network_idle_page_load_time=5.0,
highlight_elements=True,
save_recording_path="./recordings",
)
)
return browser, context
This asynchronous helper initializes a headless (or headed) Browser occasion and wraps it in a BrowserContext configured to attend for community‑idle web page hundreds, visually spotlight parts throughout interactions, and save a recording of every session below ./recordings. It then returns each the browser and its prepared‑to‑use context in your agent’s duties.
async def agent_loop(llm, browser_context, question, initial_url=None):
initial_actions = [{"open_tab": {"url": initial_url}}] if initial_url else None
agent = Agent(
job=question,
llm=llm,
browser_context=browser_context,
use_vision=True,
generate_gif=False,
initial_actions=initial_actions,
)
outcome = await agent.run()
return outcome.final_result() if outcome else None
This async helper encapsulates one “assume‐and‐browse” cycle: it spins up an Agent configured along with your LLM, the browser context, and non-compulsory preliminary URL tab, leverages imaginative and prescient when accessible, and disables GIF recording. When you name agent_loop, it runs the agent via its steps and returns the agent’s last outcome (or None if nothing is produced).
async def fundamental():
raw_key = getpass("Enter your GEMINI_API_KEY: ")
os.environ["GEMINI_API_KEY"] = raw_key
api_key = SecretStr(raw_key)
model_name = "gemini-2.5-flash-preview-04-17"
llm = ChatGoogleGenerativeAI(mannequin=model_name, api_key=api_key)
browser, context = await setup_browser(headless=True)
strive:
whereas True:
question = enter("nEnter immediate (or go away clean to exit): ").strip()
if not question:
break
url = enter("Non-compulsory URL to open first (or clean to skip): ").strip() or None
print("n🤖 Operating agent…")
reply = await agent_loop(llm, context, question, initial_url=url)
print("n📊 Search Resultsn" + "-"*40)
print(reply or "No outcomes discovered")
print("-"*40)
lastly:
print("Closing browser…")
await browser.shut()
await fundamental()
Lastly, this fundamental coroutine drives the whole Colab session: it securely prompts in your Gemini API key (utilizing getpass and SecretStr), units up the ChatGoogleGenerativeAI LLM and a headless Playwright browser context, then enters an interactive loop the place it reads your pure‑language prompts (and non-compulsory begin URL), invokes the agent_loop to carry out the browser‑pushed AI job, prints the outcomes, and eventually ensures the browser closes cleanly.
In conclusion, by following this information, you now have a reproducible Colab template that integrates browser automation, LLM reasoning, and safe credential administration right into a single cohesive pipeline. Whether or not you’re scraping actual‑time market information, summarizing information articles, or automating reporting duties, the mix of Playwright, browser_use, and LangChain’s Gemini interface supplies a versatile basis in your subsequent AI‑powered challenge. Be happy to increase the agent’s capabilities, re‑allow GIF recording, add customized navigation steps, or swap in different LLM backends to tailor the workflow exactly to your analysis or manufacturing wants.
Right here is the Colab Pocket book. Additionally, don’t overlook to observe us on Twitter and be part of our Telegram Channel and LinkedIn Group. Don’t Overlook to affix our 90k+ ML SubReddit.
Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its reputation amongst audiences.