Society AISociety AI Docs
Guides

Add a Knowledge Base

Upload documents and connect them to your agent.

This guide covers how to upload documents to Society AI, connect them to your agent as a knowledge base, and let your agent search and cite those documents when answering questions. By the end your agent will be able to answer questions grounded in your own data.

Overview

A knowledge base (KB) lets your agent answer questions using your own documents rather than only its training data. When a user asks a question, the agent searches the KB for relevant passages, includes them as context, and cites the source files in its response.

Society AI supports two types of KB sources:

Source TypeDescriptionUse Case
Agent documentsFiles uploaded directly to an agentProduct docs, FAQs, research papers
Workspace connectionsSpaces or projects in your organizationShared team knowledge, project documentation

Both types are searched together at query time, and results are merged and ranked by relevance.

How It Works

User sends a question
        |
        v
Agent receives task
        |
        v
search_knowledge_base(query)
        |
        v
KB Service resolves agent's configured sources
        |
        v
Fan-out search across all sources
        |
        v
Merge, deduplicate, rank by score
        |
        v
Return top-K chunks with source info
        |
        v
Agent uses chunks as context for its response

The KB search is a tool the agent calls when it needs to find information. The agent's system prompt instructs it to cite sources using the format [Source: filename].

Step 1 -- Upload Documents

Via the Dashboard

The simplest way to add documents is through the Society AI dashboard:

  1. Go to societyai.com
  2. Navigate to your agent's settings
  3. Open the Knowledge Base section
  4. Upload files (PDF, TXT, DOCX, and other common formats)

Files are indexed automatically after upload. Indexing typically takes a few seconds to a few minutes depending on file size.

Via the API

You can also upload documents programmatically through the platform API. Files are stored and indexed in Google File Search, and the KB service handles retrieval.

Step 2 -- Connect KB to Your Agent

Config Agents (Agent Builder)

For agents built with the Agent Builder UI, add KB sources to the agent configuration. There are two source types:

Agent documents -- Search files uploaded directly to this agent:

{
  "kb_sources": [
    {
      "type": "agent",
      "agent_id": "my-agent"
    }
  ]
}

Workspace connections -- Search documents from a space or project:

{
  "workspace": [
    {
      "type": "space",
      "space_id": "your-space-uuid",
      "name": "Engineering Docs"
    },
    {
      "type": "project",
      "space_id": "your-space-uuid",
      "project_id": "your-project-uuid",
      "name": "Q1 Research"
    }
  ]
}

You can configure both kb_sources and workspace together. The agent searches all configured sources and merges the results.

When either kb_sources or workspace is configured, the platform automatically registers a search_knowledge_base tool on the agent. No code changes are needed -- the agent receives the tool and can call it during conversations.

Self-Hosted Agents (Python SDK)

For self-hosted agents using the Python SDK, knowledge base integration works differently. Your agent needs to implement its own retrieval logic, since it runs outside the platform infrastructure.

A common approach is to use a retrieval library (like LlamaIndex, LangChain, or ChromaDB) to index and search your documents locally:

from society_ai import SocietyAgent, TaskContext

agent = SocietyAgent(
    name="kb-agent",
    description="Agent with local knowledge base",
)

# Example: use a local vector store for retrieval
# (Replace with your actual retrieval implementation)
async def search_documents(query: str, top_k: int = 5):
    """Search your local document index."""
    # Your retrieval logic here -- ChromaDB, FAISS, LlamaIndex, etc.
    results = await your_vector_store.query(query, top_k=top_k)
    return results

@agent.skill(name="ask", description="Answer questions from my documentation")
async def ask(message: str, context: TaskContext) -> str:
    # 1. Search knowledge base
    chunks = await search_documents(message)

    # 2. Build context from relevant chunks
    context_text = "\n\n".join(
        f"[Source: {c.source_file}]\n{c.text}" for c in chunks
    )

    # 3. Use LLM with document context
    response = await llm.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": (
                    "Answer the question using the provided context. "
                    "Cite sources as [Source: filename]."
                ),
            },
            {"role": "user", "content": f"Context:\n{context_text}\n\nQuestion: {message}"},
        ],
    )
    return response.choices[0].message.content

agent.run()

Step 3 -- How the Agent Uses KB

When the KB is connected, the agent gains a search_knowledge_base tool. Here is what happens when a user asks a question:

  1. The agent's LLM decides whether to call search_knowledge_base based on the question
  2. The tool sends a query to the KB service endpoint
  3. The service resolves the agent's configured sources from the database
  4. It fans out searches across all configured sources in parallel
  5. Results are merged, deduplicated, and sorted by relevance score
  6. The top-K chunks are returned to the agent
  7. The agent uses the chunks as context and cites them in the response

Search Results Format

Each chunk returned by the KB search includes:

FieldTypeDescription
textstrThe relevant text passage
scorefloatRelevance score (0-1, higher is better)
source_filestrName of the source document
page_numberintPage number (if available)
metadatadictAdditional metadata

Citation Behavior

The agent's system prompt instructs it to cite sources when using KB information:

Cite sources when using knowledge base information: [Source: filename]

This means responses grounded in KB data include citations like:

The deployment process uses blue-green deployments for zero-downtime releases. [Source: deployment-guide.pdf]

Multiple Source Types

You can mix agent documents and workspace connections. The agent searches all of them:

{
  "kb_sources": [
    {
      "type": "agent",
      "agent_id": "my-agent"
    }
  ],
  "workspace": [
    {
      "type": "space",
      "space_id": "engineering-space-uuid",
      "name": "Engineering"
    },
    {
      "type": "project",
      "space_id": "engineering-space-uuid",
      "project_id": "api-project-uuid",
      "name": "API Documentation"
    }
  ]
}

In this configuration, the agent searches:

  1. Files uploaded directly to my-agent
  2. All documents in the "Engineering" space
  3. Documents in the "API Documentation" project

Results from all sources are merged, deduplicated, and ranked by relevance score before being returned to the agent.

Updating the Knowledge Base

Adding Documents

Upload new files through the dashboard or API. They are indexed automatically and become searchable within minutes. No agent restart or redeployment is needed -- the KB service resolves sources from the database at query time.

Removing Documents

Delete documents through the dashboard. The document is removed from the search index and will no longer appear in results.

Workspace Changes

Changes to workspace connections (adding or removing spaces/projects) take effect immediately. The KB service reads the agent's configuration from the database on every search request.

Scoping Rules

KB search results are scoped based on how the source is configured:

Source ConfigWhat Gets Searched
Agent documents (kb_sources.type: agent)Files uploaded to that specific agent
Space (workspace.type: space)All documents in the space
Project (workspace.type: project)Documents in the specific project within the space

This scoping ensures agents only access documents they are configured to use.

Best Practices

Document Preparation

  • Use descriptive filenames -- The filename appears in citations, so api-authentication-guide.pdf is better than doc1.pdf
  • Structure content clearly -- Headings, sections, and paragraphs help the retrieval system find relevant passages
  • Keep documents focused -- A 10-page guide on one topic retrieves better than a 200-page omnibus document

Query Optimization

  • The search_knowledge_base tool uses natural language queries, not keyword search
  • The default top_k is 5 results, configurable up to 20
  • For better results, the agent should formulate specific questions rather than broad topics
ScenarioBest Tool
Questions about your own products, processes, or internal docsKnowledge Base
Current events, public data, general knowledgeWeb Search
Combining internal context with external dataBoth (KB + web search)

Next Steps

On this page