Websites
The Websites page lets you add public web URLs to your agent's knowledge base. The platform visits each URL, extracts the main text content, and indexes it as searchable knowledge — so your agent can reference and cite live web pages the same way it does uploaded documents.
Navigate to Resources → Websites in the left sidebar.
Managing the Websites List
The list shows all extracted websites registered to this agent. Each entry represents a single URL that has been, or is being, processed. Use Search Websites to filter by title, code, or status — click it again to collapse the panel. Use the Sort By and Sort Direction controls in the top-right to order entries; your preference is persisted across page visits.
Grounding your agent in real web content reduces the risk of hallucinations by anchoring answers to verifiable sources.
Adding a New Website
Click New Website in the top-left to open the website form.
Website Source
Enter the full URL you want to index. Only http:// and https:// protocols are accepted. The URL must contain a valid hostname (e.g., example.com), must not exceed 2 048 characters, and must not point to localhost or private IP addresses.
After entering a URL, click Get Metadata to automatically populate the title and thumbnail from the page.
Ingestion Mode
Ingestion Mode controls how the extracted web text is split into chunks before it is stored in the knowledge base. Two modes are available for web resources:
| Mode | Best for |
|---|---|
| Skimming | General web pages, articles, and most sites. Balanced speed and quality. (default) |
| Semantic | Content where precise, context-aware retrieval matters most. |
Click the ⋯ button next to the dropdown to open the ingestion parameters panel for the selected mode.
Skimming parameters
| Parameter | Description |
|---|---|
| Content Training AI Model | Model used for knowledge base training and embedding generation. |
| Use Summarization | Adds brief summaries to higher-level sections to improve context. Useful for structured or long content. Additional usage may apply. |
| Keep Small Sections Separate (advanced) | Prevents the system from merging small sections into larger chunks. Turning this on may reduce retrieval quality. |
| Min Chunk Tokens (advanced, visible when merging is enabled) | Minimum token size before two sections are merged. |
Semantic parameters
| Parameter | Description |
|---|---|
| Content Size (tokens) | How many tokens go into each chunk. Smaller chunks (200–400) give precise answers; larger (600–1 000) preserve broader context. Recommended: 400–600. |
| Overlapping Ratio (%) | How much content each chunk shares with the next (1–80 %). Higher values improve flow across boundaries but increase cost. Recommended: 10–20 %. |
Content Extraction
Content Extraction controls how many pages the platform visits from the starting URL.
| Option | Behaviour |
|---|---|
| Single Page | Extracts only the page at the URL you entered. |
| Related Pages | Follows links from the starting page and extracts additional pages. |
When Related Pages is selected, click the ⋯ button to configure the crawl parameters:
| Parameter | Description |
|---|---|
| Max Pages | Maximum number of pages to extract. Lower values (10–20) are faster and cheaper; higher values (30–50) give broader coverage. Recommended starting point: 30. |
| Max Depth | How many link levels deep to follow from the starting page. Depth 1 = direct links only; Depth 2 = links of links (most common); Depth 3+ = broader but risk of irrelevant content. Recommended: 2. |
| Keywords (optional) | Enter up to 10 keywords (50 characters each) to focus collection on pages containing those terms. Leave empty to extract all reachable pages without filtering. |
Title, Author, and Code
After the URL and mode are set, fill in the remaining metadata:
- Title — The display name for this website entry.
- Author — Select a registered author from the lookup.
- Code — A short unique identifier you can edit freely (e.g.,
WEB-01).
You can also attach a thumbnail — either by URL or file upload — from the Thumbnail section that appears below.
Related pages
- Documents — upload file-based knowledge sources
- Videos — add video content as a knowledge source
- Search Chunks — inspect the text units the agent retrieves from extracted web content
- Advanced — tune retrieval settings that apply to all resource types