Apify — Examples — BizFirstAI

Example 1: Scrape a Product Page and Extract Structured Data

Scenario: Instantly scrape a competitor's product page to extract the title, price, and description without writing any actor code.

{
  "resource": "actor",
  "operation": "scrapeSingleUrl",
  "api_token": "{{ credentials.apify_token }}",
  "url": "{{ vars.competitor_product_url }}",
  "wait_for_selector_css": ".product-price",
  "extract_html": false,
  "extract_text": true,
  "take_screenshot": true,
  "screenshot_width": 1280
}

Expected outcome: The success port fires with text containing the full page text and screenshotBase64 containing a PNG of the page. Connect a DataMapping node to parse out the price and title using regex or string expressions.

Example 2: Run a Full Website Crawler Actor and Poll for Completion

Scenario: Launch Apify's apify/website-content-crawler to crawl an entire site, then poll every 30 seconds until the run completes before processing results.

// Step 1 — actor/run node
{
  "resource": "actor",
  "operation": "run",
  "api_token": "{{ credentials.apify_token }}",
  "actor_id": "apify/website-content-crawler",
  "memory_mb": 1024,
  "timeout_seconds": 600,
  "input_json": "{\"startUrls\":[{\"url\":\"{{ vars.target_website }}\"}],\"maxCrawlDepth\":3,\"maxCrawlPages\":200}"
}

// Step 2 — Delay node (30 seconds)

// Step 3 — actorRun/getRun node (loop until status !== "RUNNING")
{
  "resource": "actorRun",
  "operation": "getRun",
  "api_token": "{{ credentials.apify_token }}",
  "run_id": "{{ nodes.step1.id }}"
}

Expected outcome: After the loop exits with status = "SUCCEEDED", route to dataset/getItems using defaultDatasetId from the run object to retrieve all crawled pages.

Example 3: Get Items from a Completed Dataset Run

Scenario: A dataset was populated by an earlier actor run. Read the first 100 items from that dataset and load them into a Loop node for per-record processing.

{
  "resource": "dataset",
  "operation": "getItems",
  "api_token": "{{ credentials.apify_token }}",
  "dataset_id": "{{ vars.dataset_id }}",
  "clean": true,
  "limit": 100,
  "offset": 0,
  "fields": "title,price,url,sku"
}

Expected outcome: The success port returns an items array of up to 100 objects, each containing only the four requested fields. Feed this into a Loop node to process each product record individually.

Example 4: Schedule Actor Runs and Process Results

Scenario: A ScheduledTrigger fires daily at 6 AM. Run a price monitoring actor, wait for results, and write new prices to MongoDB only if they changed.

// Triggered by ScheduledTrigger (daily 06:00)

// Apify node — actor/runAndGetDatasetItems
{
  "resource": "actor",
  "operation": "runAndGetDatasetItems",
  "api_token": "{{ credentials.apify_token }}",
  "actor_id": "apify/cheerio-scraper",
  "memory_mb": 512,
  "timeout_seconds": 300,
  "max_items": 500,
  "input_json": "{\"startUrls\":[{\"url\":\"{{ vars.pricing_page }}\"}],\"pageFunction\":\"async ({ $, request }) => ({ product: $('h1').text(), price: $('.price').text(), url: request.url })\"}"
}

// Loop node over items array
// → IfCondition: items[i].price !== vars.last_known_prices[items[i].product]
//   → MongoDB/insertOne to record price change with timestamp

Expected outcome: Each morning, fresh pricing data is collected and only changed prices are written to MongoDB, building a historical price change log for analytics.

Example 5: Use actorTask for Pre-configured Scrape Jobs

Scenario: Your team created a saved Apify task that bundles a scraper actor with your site-specific configuration. Invoke it via task ID so no input JSON is needed at runtime.

{
  "resource": "actorTask",
  "operation": "runAndGetDatasetItems",
  "api_token": "{{ credentials.apify_token }}",
  "task_id": "{{ vars.apify_task_id }}",
  "max_items": 200
}

// Override task input for this specific run only (optional):
{
  "resource": "actorTask",
  "operation": "runAndGetDatasetItems",
  "api_token": "{{ credentials.apify_token }}",
  "task_id": "{{ vars.apify_task_id }}",
  "input_json": "{\"startDate\":\"{{ vars.report_start_date }}\",\"endDate\":\"{{ vars.report_end_date }}\"}",
  "max_items": 1000
}

Expected outcome: The task runs with its saved actor configuration. The first variant uses the saved input as-is. The second variant passes a date range override — useful for parameterised report generation without exposing full actor configuration details to the workflow.