Documents
The Documents page is where you upload and manage the files that form your agent's primary knowledge base. Supported formats are PDF, Markdown (.md), and plain text (.txt). Each uploaded document is parsed, split into searchable chunks, and indexed — after which the agent can draw on its content when answering user queries.
Navigate to Resources → Documents in the left sidebar.
Managing the Documents List
The list displays documents as cards (grid view) or rows (list view). Use the view toggle in the top-right to switch between them. Each card shows the document's title, code, author, status badge, and processing progress.
Document statuses: Draft → Pending → Processing → Processed (ready) or Failed (error).
Use Search Documents to filter by code, title, author, or status. Use the Sort By controls in the top-right to change the sort order.
Creating a New Document
Click New Document in the top-left of the page. This opens the document form where you can:
- Enter a title, code, and description
- Upload the file
- Assign an author
- Set the document's language and content type
- Submit for processing
Ingestion Modes
When you upload a document, you choose an Ingestion Mode that controls how the system splits the document into searchable chunks. The right mode depends on the type and length of your document.
Quick Comparison
| Mode | Best For | AI Processing | Speed |
|---|---|---|---|
| Semantic | General documents — prose, reports, articles | None | Fast |
| Skimming | Long structured documents — books, manuals, multi-chapter reports | Low–Medium | Slow |
| Regulatory Context | Legal documents — laws, bylaws, regulations | Minimal | Slow |
| Fact Sheet | Short documents — product sheets, fund fact sheets (1–3 pages) | Medium | Medium |
| Adaptive | Complex reference books, documents with heavy cross-references | High | Slowest |
Not sure which to pick? Choose Skimming. It works well for most document types and preserves the document's natural structure.
Semantic
The simplest and most reliable mode. The document is split at natural sentence boundaries into evenly sized chunks. Each chunk overlaps slightly with the previous one so context is not lost at the edges.
Best for
- General-purpose documents where structure does not matter much
- Reports, articles, FAQs, knowledge base files
- Any document when you are unsure which mode to use
Parameters
| Parameter | Description |
|---|---|
| Content Size | Maximum number of tokens per chunk. Larger values produce fewer, longer chunks. Default: 500. |
| Overlapping Ratio | Percentage of sentences from the previous chunk that are repeated at the start of the next chunk. Higher values preserve more context across boundaries. Range: 1–80%. Default: 20%. |
| Start Page | First page to process. Leave blank to start from page 1. |
| Finish Page | Last page to process. Leave blank to process to the end. |
Advanced Parameters (optional)
| Parameter | Description |
|---|---|
| Table as Plain Text | When on, tables are extracted as plain text inside regular chunks. When off, each table becomes its own separate chunk. |
| Skip Image Analysis | When on, images are ignored. When off, an AI model generates a description for each image and includes it as a chunk. |
Skimming
Detects the document's chapter and section structure using AI, then creates one chunk per section — each chunk is titled with the full path of headings (e.g., Chapter 1 › Section 1.2 › Subsection 1.2.3). Sections that are too small are automatically merged with their neighbours; sections that are too large are split at semantic boundaries.
Best for
- Books, technical manuals, training materials
- Any document with clear chapters, sections, and subsections
- Documents where preserving heading structure in search results matters
Parameters
| Parameter | Description |
|---|---|
| Content Training AI Model | The AI model used to analyse the document structure. A more capable model may detect structure more accurately on complex documents. |
| Use Summarization | When on, the system generates an AI summary for each parent section (e.g., a chapter summary derived from its subsections). Improves context for high-level queries but increases processing time and cost. |
Advanced Parameters (optional)
| Parameter | Description |
|---|---|
| Page Overlap | Number of pages shared between adjacent processing segments when the document is too large to analyse in one pass. Helps the AI detect section boundaries that span segment edges. Default: 2. |
| Keep Small Sections Separate | When on, small sections are not merged with their neighbours and are kept exactly as the AI detected them. Useful if you want strict per-section chunks regardless of size. |
| Minimum Section Size | Sections with fewer tokens than this value are merged with an adjacent section. Only applies when Keep Small Sections Separate is off. |
| Table as Plain Text | Same as in Semantic mode. |
| Skip Image Analysis | Same as in Semantic mode. |
Regulatory Context
A dedicated parser built specifically for legal documents. It extracts each article individually — number, title, legal text, notes, and cross-references — and converts each article directly into one chunk. The article text is preserved verbatim; nothing is rewritten or summarised.
Best for
- Laws, government regulations, bylaws
- Compliance documents, legal codes
- Any document where the exact wording of each article must be preserved
Parameters
No additional parameters are required. The parser handles structure detection automatically based on standard legal document patterns.
Fact Sheet
Processes the document one page at a time. Each page — including its layout, images, and tables — is sent to an AI model that decides the chunk boundaries, writes the content in clean markdown, and extracts key metadata (product name, product type, document type, version).
Best for
- Fund fact sheets, product brochures, one-pagers
- Short documents of 1–3 pages with mixed text, tables, and charts
- Documents where preserving the visual layout and chart context matters
Parameters
No additional parameters are required. The AI model handles layout analysis and chunk boundaries automatically.
Adaptive
The most sophisticated mode. A two-phase AI pipeline first analyses the document's full structure using a reasoning model, then assembles each chunk with additional context: cross-referenced content is embedded inline, and ambiguous pronouns (e.g., "he", "it", "they") are resolved to their actual referents. The result is chunks that are independently understandable — each one makes sense on its own without needing surrounding context.
Best for
- Large reference books and encyclopaedias
- Documents with dense cross-references (e.g., "see Article 3" or "as described in Chapter 7")
- Academic or technical documents where out-of-context sentences lose their meaning
Parameters
| Parameter | Description |
|---|---|
| Document Type Hint | Optional free-text description of the document type (e.g., "educational textbook", "regulatory filing", "product manual"). Helps the AI understand context and improve structure detection. |
| Image Description AI Model | The AI model used to generate descriptions for images found in the document. |
| Excluded Pages | Page types to exclude from chunking. By default the system skips: cover pages, table of contents, references, glossary, copyright, and acknowledgments. You can override this list here. |
Advanced Parameters (optional)
| Parameter | Description |
|---|---|
| Force Full LLM Enrichment | When on, every section is processed through the AI enrichment pipeline, even sections that would normally be assembled programmatically. Produces richer context but increases processing time significantly. |
| LLM Enrichment Sections Threshold | If the number of sections eligible for AI enrichment exceeds this value, the system falls back to fully programmatic assembly for all sections. Increase this threshold to allow more AI enrichment on large documents. |
| Minimum Chunk Size | Minimum number of tokens per chunk. Chunks smaller than this are merged with adjacent sections. |
| Maximum Chunk Size | Maximum number of tokens per chunk. Chunks exceeding this are split at semantic boundaries. |
| Start Page | First page to process. Leave blank to start from page 1. |
| Finish Page | Last page to process. Leave blank to process to the end. |
| Page Overlap | Number of pages shared between adjacent processing segments. Same as in Skimming mode. |
Deactivating and Activating a Document
A document that has been fully processed (status: Processed) can be deactivated to prevent the agent from using it in responses, without permanently deleting it. A deactivated document can be reactivated at any time.
Deactivate a document
- Open the document detail page.
- Click Deactivate in the top-right action bar.
- The document status changes to Suspended. The agent will no longer reference this document.
Activate a document
- Open the detail page of a suspended document.
- Click Activate in the top-right action bar.
- The document status returns to Processed and becomes available to the agent again.
Delete a document permanently
A suspended document can be permanently deleted.
- Open the detail page of a suspended document.
- Click Delete in the top-right action bar.
- Confirm the deletion. The document and all its indexed chunks are permanently removed and cannot be recovered.
To delete a document that is still Processed, deactivate it first, then delete it.
Related pages
- Videos — add video content as a knowledge source
- Search Chunks — inspect the text units the agent retrieves from your documents
- Citations — manage authors and citation display settings
- Advanced — tune how many document chunks the agent uses per query