On this tutorial, we reveal find out how to construct an AI-powered PDF interplay system in Google Colab utilizing Gemini Flash 1.5, PyMuPDF, and the Google Generative AI API. By leveraging these instruments, we will seamlessly add a PDF, extract its textual content, and interactively ask questions, receiving clever responses from Google’s newest Gemini Flash 1.5 mannequin.
!pip set up -q -U google-generativeai PyMuPDF python-dotenv
First we set up the required dependencies for constructing an AI-powered PDF Q&A system in Google Colab. google-generativeai offers entry to Gemini Flash 1.5, enabling pure language interactions, whereas PyMuPDF (often known as Fitz) permits environment friendly textual content extraction from PDFs. Additionally, python-dotenv helps handle surroundings variables, corresponding to API keys, securely inside the pocket book.
from google.colab import information
uploaded = information.add()
We add information out of your native system to Google Colab. When executed, it opens a file choice dialog, permitting you to decide on a file (e.g., a PDF) to add. The uploaded file is saved in a dictionary-like object (uploaded), the place keys characterize file names and values include the file’s binary knowledge. This step is crucial for immediately processing paperwork, datasets, or mannequin weights in a Colab surroundings.
import fitz
def extract_pdf_text(pdf_path):
doc = fitz.open(pdf_path)
full_text = ""
for web page in doc:
full_text += web page.get_text()
return full_text
pdf_file_path="/content material/Paper.pdf"
document_text = extract_pdf_text(pdf_path=pdf_file_path)
print("Doc textual content extracted!")
print(document_text[:1000])
We use PyMuPDF (fitz) to extract textual content from a PDF file in Google Colab. The operate extract_pdf_text(pdf_path) reads the PDF, iterates by means of its pages, and retrieves the textual content content material. The extracted textual content is then saved in document_text, with the primary 1000 characters printed to preview the content material. This step is essential for enabling text-based evaluation and AI-driven query answering from PDFs.
import os
os.environ["GOOGLE_API_KEY"] = 'Use your personal API key right here'
We set the Google API key as an surroundings variable in Google Colab. The API secret is required to authenticate requests to Google Generative AI, permitting entry to Gemini Flash 1.5 for AI-powered textual content processing. Changing ‘Use your personal API key right here’ with a legitimate key ensures that the mannequin can generate responses securely inside the pocket book.
import google.generativeai as genai
genai.configure(api_key=os.environ["GOOGLE_API_KEY"])
model_name = "fashions/gemini-1.5-flash-001"
def query_gemini_flash(query, context):
mannequin = genai.GenerativeModel(model_name=model_name)
immediate = f"""
Context: {context[:20000]}
Query: {query}
Reply:
"""
response = mannequin.generate_content(immediate)
return response.textual content
pdf_text = extract_pdf_text("/content material/Paper.pdf")
query = "Summarize the important thing findings of this doc."
reply = query_gemini_flash(query, pdf_text)
print("Gemini Flash Reply:")
print(reply)
Lastly, we configure and question Gemini Flash 1.5 utilizing a PDF doc for AI-powered textual content era. It initializes the genai library with the API key and masses the Gemini Flash 1.5 mannequin (gemini-1.5-flash-001). The query_gemini_flash() operate takes a query and extracted PDF textual content as enter, formulates a structured immediate, and retrieves an AI-generated response. This setup permits automated doc summarization and clever Q&A from PDFs.
In conclusion, following this tutorial, we’ve got efficiently constructed an interactive PDF-based interplay system in Google Colab utilizing Gemini Flash 1.5, PyMuPDF, and the Google Generative AI API. This answer permits customers to extract data from PDFs and interactively question them simply. The mixture of Google’s cutting-edge AI fashions and Colab’s cloud-based surroundings offers a robust and accessible method to course of massive paperwork with out requiring heavy computational assets.
Right here is the Colab Pocket book. Additionally, don’t overlook to comply with us on Twitter and be part of our Telegram Channel and LinkedIn Group. Don’t Neglect to hitch our 80k+ ML SubReddit.
Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its recognition amongst audiences.