Crawl Website Content (Apify)
Apify → Crawl Website Content (Apify)
/v1/apify-website-content-crawler{ "url": "https://acme.com" }
{ "ok": true, "data": { "page_text": "sample", "page_title": "sample", "page_description": "sample", "page_canonical_url": "https://acme.com", "page_language_code": "sample", "crawl_loaded_url": "https://acme.com", "crawl_referrer_url": "https://acme.com" } }
Crawls website pages and extracts structured content including text, metadata, and URLs.
Install
Add crawl website content (apify) to your MCP client.
Drop this into claude_desktop_config.json (or your client's equivalent) and the tool shows up in any chat.
{ "mcpServers": { "texau": { "command": "npx", "args": ["-y", "@texau/mcp-server"], "env": { "TEXAU_API_KEY": "..." } } } }
Tool name: texau__apify-website-content-crawler
When to use this.
The "Crawl Website Content (Apify)" action is designed to efficiently crawl specified website pages and extract structured content, including text, metadata, and URLs. By providing a single input parameter, the website URL, users can initiate the crawling process. The action outputs essential data fields such as page text, title, description, canonical URL, language code, loaded URL, and referrer URL, all formatted as text or URL types. This action is particularly useful for data enrichment, enabling businesses to gather insights from web content for SEO analysis, competitive research, and content aggregation. With its ability to deliver structured data, it supports various applications in digital marketing, content management, and web analytics.
Try it
Run a sample request.
The response is a deterministic, cached example. No live call, no credits used.
Crawl Website Content (Apify)
Response
Output schema.
Every field returned in `data`. Click rows to expand nested objects.
page_textPage Texttextpage_titlePage Titletextpage_descriptionPage Descriptiontextpage_canonical_urlPage Canonical Urlnullabletextpage_language_codePage Language Codenullabletextcrawl_loaded_urlCrawl Loaded Urltextcrawl_referrer_urlCrawl Referrer Urlnullabletext
Integrate
Copy-pasteable snippets.
Real endpoint: https://v3-api.texau.com/api/v1/apify-website-content-crawler. Auth: x-api-key.
/v1/apify-website-content-crawlercurl -X POST 'https://v3-api.texau.com/api/v1/apify-website-content-crawler' \ -H 'x-api-key: $TEXAU_API_KEY' \ -H 'content-type: application/json' \ -d '{"url":"https://acme.com"}'
{ "ok": true, "data": { "page_text": "sample", "page_title": "sample", "page_description": "sample", "page_canonical_url": "https://acme.com", "page_language_code": "sample", "crawl_loaded_url": "https://acme.com", "crawl_referrer_url": "https://acme.com" } }
Compose
How this fits a workflow.
The next 2 actions most operators chain after this one.
enrichment
Search LinkedIn Sales Navigator People (Apify)
Scrape LinkedIn Sales Navigator search results via Apify. Returns a list of profiles including Name, Title, Company, and Location.
enrichment
Search LinkedIn Profiles (Apify)
Search for LinkedIn profiles using filters (Current Company, Job Title, Location, Past Company).
enrichment
Get LinkedIn Posts from Profile (Apify)
Retrieves all LinkedIn posts from a specified user profile using Apify.
Output
Results land in a TexAu table.
Sample rows below.
Real result preview coming soon.
| Input | Status | Score |
|---|---|---|
| [email protected] | valid | 96 |
| [email protected] | risky | 54 |
| [email protected] | invalid | 12 |
Workflow
A real example.
Trigger → crawl website content (apify) → enrich → push to your CRM. ~80 ms operator effort, the rest runs in the background.
Built for
Who runs this.
Reliability
Rate limits & reliability.
- Per-minute limit30 / min
- Per-day limit5,000 / day
- RetriesAutomatic w/ backoff
- ModeSync
Errors
HTTP status codes.
What each response means and what to do about it.
| Code | Cause | Fix |
|---|---|---|
| 200 OK | Action ran. Data in `data`. | Read response. |
| 400 Bad Request | Missing or malformed input. | Validate against the input schema. |
| 401 Unauthorized | Missing or invalid `x-api-key`. | Re-issue from /api-platform. |
| 403 Forbidden | Workspace lacks plan tier. | Upgrade or contact sales. |
| 404 Not Found | Action key not recognized. | Verify the slug. |
| 429 Rate Limited | Per-minute or per-day cap hit. | Backoff; reduce concurrency. |
| 500 Server Error | Unexpected TexAu issue. | Retry with backoff. |
| 502 Bad Gateway | Upstream provider 5xx. | Retry; we surface root cause. |
| 504 Timeout | Upstream slower than maxLatency. | Switch to `isAsync` polling. |
Pricing
What it costs to run.
Pricing tier on /pricing. Per-action credit cost is private.
Related
More Apify actions.
enrichment
Search LinkedIn Sales Navigator People (Apify)
Scrape LinkedIn Sales Navigator search results via Apify. Returns a list of profiles including Name, Title, Company, and Location.
enrichment
Search LinkedIn Profiles (Apify)
Search for LinkedIn profiles using filters (Current Company, Job Title, Location, Past Company).
enrichment
Get LinkedIn Posts from Profile (Apify)
Retrieves all LinkedIn posts from a specified user profile using Apify.
enrichment
Search LinkedIn Companies (Apify)
Retrieve enriched LinkedIn company search results returns detailed company profiles including name, industries, employee counts, locations, LinkedIn URL, websit
FAQ.
Is this real-time?
Yes. Synchronous actions return in ~1–4 s. Long-running work uses async polling (see status 504 → switch to async).
Do I get charged on failure?
No. Verified failures cost zero credits. Provider miss / 5xx / timeout cascade to the next provider in the waterfall when applicable.
Does it work with Claude / Cursor via MCP?
Yes. Add the texau MCP server to your client config, then call `texau__apify-...` directly.
What CRMs can I push results to?
HubSpot, Salesforce, Pipedrive, Zoho, and GoHighLevel are bidirectional. Smartlead, Instantly, Lemlist, HeyReach, Apollo Sequences, and Reply.io for outbound.
Run Crawl Website Content (Apify) in 60 seconds.
Pull your API key, paste the cURL, ship to your CRM.