Extract Sitemap URLs
Texau → Extract Sitemap URLs
/v1/texau-web-sitemap{ "url": "https://acme.com" }
{ "ok": true, "data": { "extracted_url": "https://acme.com" } }
Discovers and extracts URLs from a website's sitemap via sitemap.xml, robots.txt, or crawl discovery.
Install
Add extract sitemap urls to your MCP client.
Drop this into claude_desktop_config.json (or your client's equivalent) and the tool shows up in any chat.
{ "mcpServers": { "texau": { "command": "npx", "args": ["-y", "@texau/mcp-server"], "env": { "TEXAU_API_KEY": "..." } } } }
Tool name: texau__texau-web-sitemap
When to use this.
The "Extract Sitemap URLs" action, part of the enrichment category, facilitates the discovery and extraction of URLs from a website's sitemap using sources such as sitemap.xml, robots.txt, or through crawl discovery methods. By providing a target website URL as an input parameter, users can efficiently retrieve a comprehensive list of URLs associated with that site. The output is a collection of extracted URLs, categorized as semantic types of URL, text, and number. This action is particularly useful for SEO professionals, web developers, and data analysts who need to gather site structure information, enhance web crawling processes, or improve website indexing strategies.
Try it
Run a sample request.
The response is a deterministic, cached example. No live call, no credits used.
Extract Sitemap URLs
Response
Output schema.
Every field returned in `data`. Click rows to expand nested objects.
extracted_urlExtracted Urlstring
Integrate
Copy-pasteable snippets.
Real endpoint: https://v3-api.texau.com/api/v1/texau-web-sitemap. Auth: x-api-key.
/v1/texau-web-sitemapcurl -X POST 'https://v3-api.texau.com/api/v1/texau-web-sitemap' \ -H 'x-api-key: $TEXAU_API_KEY' \ -H 'content-type: application/json' \ -d '{"url":"https://acme.com"}'
{ "ok": true, "data": { "extracted_url": "https://acme.com" } }
Compose
How this fits a workflow.
The next 2 actions most operators chain after this one.
enrichment
Remove Extra Whitespace (Utility)
Cleans up text by removing extra spaces, tabs, and newlines while preserving single spaces.
enrichment
Identify Email Type (Utility)
Classifies an email address as personal, business, educational, or freemail and extracts the provider domain.
enrichment
Find URL Redirect Destination (Utility)
Resolves a URL to its final destination and returns the full redirect chain with HTTP status codes.
Output
Results land in a TexAu table.
Sample rows below.
Real result preview coming soon.
| Input | Status | Score |
|---|---|---|
| [email protected] | valid | 96 |
| [email protected] | risky | 54 |
| [email protected] | invalid | 12 |
Workflow
A real example.
Trigger → extract sitemap urls → enrich → push to your CRM. ~80 ms operator effort, the rest runs in the background.
Built for
Who runs this.
Reliability
Rate limits & reliability.
- Per-minute limit60 / min
- Per-day limit50,000 / day
- RetriesAutomatic w/ backoff
- ModeSync
Errors
HTTP status codes.
What each response means and what to do about it.
| Code | Cause | Fix |
|---|---|---|
| 200 OK | Action ran. Data in `data`. | Read response. |
| 400 Bad Request | Missing or malformed input. | Validate against the input schema. |
| 401 Unauthorized | Missing or invalid `x-api-key`. | Re-issue from /api-platform. |
| 403 Forbidden | Workspace lacks plan tier. | Upgrade or contact sales. |
| 404 Not Found | Action key not recognized. | Verify the slug. |
| 429 Rate Limited | Per-minute or per-day cap hit. | Backoff; reduce concurrency. |
| 500 Server Error | Unexpected TexAu issue. | Retry with backoff. |
| 502 Bad Gateway | Upstream provider 5xx. | Retry; we surface root cause. |
| 504 Timeout | Upstream slower than maxLatency. | Switch to `isAsync` polling. |
Related
More Texau actions.
enrichment
Remove Extra Whitespace (Utility)
Cleans up text by removing extra spaces, tabs, and newlines while preserving single spaces.
enrichment
Identify Email Type (Utility)
Classifies an email address as personal, business, educational, or freemail and extracts the provider domain.
enrichment
Find URL Redirect Destination (Utility)
Resolves a URL to its final destination and returns the full redirect chain with HTTP status codes.
enrichment
Format Date and Time (Utility)
Converts a date-time string to UTC timezone and extracts individual date and time components.
FAQ.
Is this real-time?
Yes. Synchronous actions return in ~1–4 s. Long-running work uses async polling (see status 504 → switch to async).
Do I get charged on failure?
No. Verified failures cost zero credits. Provider miss / 5xx / timeout cascade to the next provider in the waterfall when applicable.
Does it work with Claude / Cursor via MCP?
Yes. Add the texau MCP server to your client config, then call `texau__texau-...` directly.
What CRMs can I push results to?
HubSpot, Salesforce, Pipedrive, Zoho, and GoHighLevel are bidirectional. Smartlead, Instantly, Lemlist, HeyReach, Apollo Sequences, and Reply.io for outbound.
Run Extract Sitemap URLs in 60 seconds.
Pull your API key, paste the cURL, ship to your CRM.