Docs/Configuring Your Agent/Websites

Websites

The Websites page lets you add public web URLs to your agent's knowledge base. The platform visits each URL, extracts the main text content, and indexes it as searchable knowledge — so your agent can reference and cite live web pages the same way it does uploaded documents.

Navigate to Resources → Websites in the left sidebar.

Managing the Websites List

The list shows all extracted websites registered to this agent. Each entry represents a single URL that has been, or is being, processed. Use Search Websites to filter by title, code, or status — click it again to collapse the panel. Use the Sort By and Sort Direction controls in the top-right to order entries; your preference is persisted across page visits.

Grounding your agent in real web content reduces the risk of hallucinations by anchoring answers to verifiable sources.

Adding a New Website

Click New Website in the top-left to open the website form.

Website Source

Enter the full URL you want to index. Only http:// and https:// protocols are accepted. The URL must contain a valid hostname (e.g., example.com), must not exceed 2 048 characters, and must not point to localhost or private IP addresses.

After entering a URL, click Get Metadata to automatically populate the title and thumbnail from the page.

Ingestion Mode

Ingestion Mode controls how the extracted web text is split into chunks before it is stored in the knowledge base. Two modes are available for web resources:

Mode	Best for
Skimming	General web pages, articles, and most sites. Balanced speed and quality. (default)
Semantic	Content where precise, context-aware retrieval matters most.

Click the ⋯ button next to the dropdown to open the ingestion parameters panel for the selected mode.

Skimming parameters

Parameter	Description
Content Training AI Model	Model used for knowledge base training and embedding generation.
Use Summarization	Adds brief summaries to higher-level sections to improve context. Useful for structured or long content. Additional usage may apply.
Keep Small Sections Separate (advanced)	Prevents the system from merging small sections into larger chunks. Turning this on may reduce retrieval quality.
Min Chunk Tokens (advanced, visible when merging is enabled)	Minimum token size before two sections are merged.

Semantic parameters

Parameter	Description
Content Size (tokens)	How many tokens go into each chunk. Smaller chunks (200–400) give precise answers; larger (600–1 000) preserve broader context. Recommended: 400–600.
Overlapping Ratio (%)	How much content each chunk shares with the next (1–80 %). Higher values improve flow across boundaries but increase cost. Recommended: 10–20 %.

Content Extraction

Content Extraction controls how many pages the platform visits from the starting URL.

Option	Behaviour
Single Page	Extracts only the page at the URL you entered.
Related Pages	Follows links from the starting page and extracts additional pages.

When Related Pages is selected, click the ⋯ button to configure the crawl parameters:

Parameter	Description
Max Pages	Maximum number of pages to extract. Lower values (10–20) are faster and cheaper; higher values (30–50) give broader coverage. Recommended starting point: 30.
Max Depth	How many link levels deep to follow from the starting page. Depth 1 = direct links only; Depth 2 = links of links (most common); Depth 3+ = broader but risk of irrelevant content. Recommended: 2.
Keywords (optional)	Enter up to 10 keywords (50 characters each) to focus collection on pages containing those terms. Leave empty to extract all reachable pages without filtering.

Title, Author, and Code

After the URL and mode are set, fill in the remaining metadata:

Title — The display name for this website entry.
Author — Select a registered author from the lookup.
Code — A short unique identifier you can edit freely (e.g., WEB-01).

You can also attach a thumbnail — either by URL or file upload — from the Thumbnail section that appears below.

Documents — upload file-based knowledge sources
Videos — add video content as a knowledge source
Search Chunks — inspect the text units the agent retrieves from extracted web content
Advanced — tune retrieval settings that apply to all resource types

PreviousVideos

NextSearch Chunks