AI functions hardly ever cope with one clear desk. They combine consumer profiles, chat logs, JSON metadata, embeddings, and typically spatial information. Most groups reply this with a patchwork of an OLTP database, a vector retailer, and a search engine. OceanBase launched seekdb, an open supply AI centered database (beneath the Apache 2.0 license). seekdb is described as an AI native search database that unifies relational information, vector information, textual content, JSON, and GIS in a single engine and exposes hybrid search and in database AI workflows.Â
What’s seekdb?
seekdb is positioned because the light-weight, embedded model of the OceanBase engine, aimed toward AI functions slightly than common goal distributed deployments. It runs as a single node database, helps embedded mode and shopper or server mode, and stays suitable with MySQL drivers and SQL syntax.
Within the functionality matrix, seekdb is marked as:
- Embedded database supported
- Standalone database supported
- Distributed database not supported
whereas the total OceanBase product covers the distributed case.
From a knowledge mannequin perspective, seekdb helps:
- Relational information with normal SQL
- Vector search
- Full textual content search
- JSON information
- Spatial GIS information
all inside one storage and indexing layer.
Hybrid search because the core function
The primary function OceanBase pushes is hybrid search. That is search that mixes vector primarily based semantic retrieval, full textual content key phrase retrieval, and scalar filters in a single question and a single rating step.
seekdb implements hybrid search by way of a system package deal named DBMS_HYBRID_SEARCH with two entry factors:
- DBMS_HYBRID_SEARCH.SEARCH which returns outcomes as JSON, sorted by relevance
- DBMS_HYBRID_SEARCH.GET_SQL which returns the concrete SQL string used for execution
The hybrid search path can run:
- pure vector search
- pure full textual content search
- mixed hybrid search
and may push relational filters and joins down into storage. It additionally helps question reranking methods like weighted scores and reciprocal rank fusion and may plug in giant language mannequin primarily based re-rankers.
For retrieval augmented technology (RAG) and agent reminiscence, this implies you possibly can write a single SQL question that does semantic matching on embeddings, actual matching on product codes or correct nouns, and relational filtering on consumer or tenant scopes.
Vector and full textual content engine particulars
At its core, seekdb exposes a trendy vector and full textual content stack.
For vectors, seekdb:
- helps dense vectors and sparse vectors
- helps Manhattan, Euclidean, interior product, and cosine distance metrics
- gives in reminiscence index varieties akin to HNSW, HNSW SQ, HNSW BQ
- gives disk primarily based index varieties together with IVF and IVF PQ
Hybrid vector index present how one can retailer uncooked textual content, let seekdb name an embedding mannequin robotically, and have the system preserve the corresponding vector index and not using a separate preprocessing pipeline.
For textual content, seekdb provides full textual content search with:
- key phrase, phrase, and Boolean queries
- BM25 rating for relevance
- a number of tokenizer modes
The important thing level is that full textual content and vector indexes are first-class and are built-in in the identical question planner as scalar indexes and GIS indexes, so hybrid search doesn’t want exterior orchestration.
AI features contained in the database
seekdb consists of in-built AI operate expressions that allow you to name fashions straight from SQL, and not using a separate utility service mediating each name. The primary features are:
- AI_EMBED to transform textual content into embeddings
- AI_COMPLETE for textual content technology utilizing a chat or completion mannequin
- AI_RERANK to rerank an inventory of candidates
AI_PROMPT to assemble immediate templates and dynamic values right into a JSON object for AI_COMPLETE
Mannequin metadata and endpoints are managed by the DBMS_AI_SERVICE package deal, which helps you to register exterior suppliers, set URLs, and configure keys, all on the database aspect.Â
Multimodal information and workloads
seekdb is constructed to deal with a number of information modalities in a single node. it has a multimodal information and indexing layer that covers vectors, textual content, JSON, and GIS, and a multi-model compute layer for hybrid workloads throughout vector, full textual content, and scalar situations.
It additionally gives JSON indexes for metadata queries and GIS indexes for spatial situations. This permits queries like:
- discover semantically related paperwork
- filter by JSON metadata like tenant, area, or class
- constrain by spatial vary or polygon
with out leaving the identical engine.
As a result of seekdb is derived from the OceanBase engine, it inherits ACID transactions, row and column hybrid storage, and vectorized execution, though excessive scale distributed deployments stay a job for the total OceanBase database.
Comparability Desk


Key Takeaways
- AI native hybrid search: seekdb unifies vector search, full textual content search and relational filtering in a single SQL and DBMS_HYBRID_SEARCH interface, so RAG and agent workloads can run multi sign retrieval in a single question as an alternative of sewing collectively a number of engines.
- Multimodal information in a single engine: seekdb shops and indexes relational information, vectors, textual content, JSON and GIS in the identical engine, which lets AI functions maintain paperwork, embeddings and metadata constant with out sustaining separate databases.
- In database AI features for RAG: With AI_EMBED, AI_COMPLETE, AI_RERANK and AI_PROMPT, seekdb can name embedding fashions, LLMs and rerankers straight from SQL, which simplifies RAG pipelines and strikes extra orchestration logic into the database layer.
- Single node, embedded pleasant design: seekdb is a single node, MySQL suitable engine that helps embedded and standalone modes, whereas distributed, giant scale deployments stay the position of full OceanBase, which makes seekdb appropriate for native, edge and repair embedded AI workloads.
- Open supply and power ecosystem: seekdb is open sourced beneath Apache 2.0 and integrates with a rising ecosystem of AI instruments and frameworks, with Python assist by way of pyseekdb and MCP primarily based integration for code assistants and brokers, so it will possibly act as a unified information aircraft for AI functions.
Try the Repo and Venture. Be happy to take a look at our GitHub Web page for Tutorials, Codes and Notebooks. Additionally, be at liberty to comply with us on Twitter and don’t neglect to hitch our 100k+ ML SubReddit and Subscribe to our Publication. Wait! are you on telegram? now you possibly can be a part of us on telegram as properly.
Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its recognition amongst audiences.
