Apify — Overview — BizFirstAI

What Is Apify?

Apify is a cloud platform for web scraping, browser automation, and data extraction. It provides a marketplace of ready-made Actors — containerized automation programs — that can scrape websites, extract structured data, monitor pages for changes, or perform any headless-browser task. Each actor run produces output in a Dataset (structured rows) and a Key-Value Store (arbitrary files and blobs). Apify manages compute, scheduling, proxies, and storage so you can focus on the data.

The BizFirstAI Apify node gives workflows direct access to 11 operations across 5 resource types.

Authentication: All operations require an Apify API Token. Generate one at https://console.apify.com/account/integrations. Store it as a BizFirstAI credential and reference it via credentialId in the node configuration. The token is passed per-call — the node is fully stateless and multi-tenant safe.

Resources & Operations

Resource	Operation (config key)	Page	Summary
actor	`run`	actor/run	Start an actor run. Optionally wait for completion. Returns run metadata including the default dataset ID.
	`get-last-run`	actor/getLastRun	Retrieve the most recent run for an actor, with optional status filter.
	`run-and-get-dataset-items`	actor/runAndGetDatasetItems	Run an actor and immediately return its output dataset items in one atomic step.
	`scrape-single-url`	actor/scrapeSingleUrl	Scrape one URL using Apify's built-in scraper. Choose between `cheerio` and `playwright`.
actor-task	`run`	actor-task/run	Run a saved Actor Task — a preconfigured actor run with stored input.
actor-task	`run-and-get-dataset-items`	actor-task/runAndGetDatasetItems	Run an Actor Task and return its dataset output in one step.
actor-run	`get-run`	actor-run/get	Fetch current status and metadata for a specific run by Run ID.
	`get-actor-runs`	actor-run/getActorRuns	List all runs for a specific actor with pagination and status filter.
	`get-user-runs-list`	actor-run/getUserRunsList	List all runs across all actors for the authenticated user.
dataset	`get-items`	dataset/getItems	Retrieve paginated items from any Apify dataset by Dataset ID.
key-value-store	`get-record`	key-value-store/getRecord	Fetch a single record from a Key-Value Store. Handles JSON, text, and binary content types.

Authentication Setup

All Apify operations authenticate via a single API token. The node reads this token from credentialId (preferred) or falls back to the literal apiToken config key.

Field	Required	Description
`credentialId`	Required	BizFirstAI credential reference holding your Apify API token. Generate the token at Apify Console → Settings → Integrations. The node also accepts a literal `apiToken` key for quick testing.
`baseUrl`	Optional	Override the Apify API base URL. Defaults to `https://api.apify.com`. Only needed for on-premise or proxied Apify deployments.

Token scope: Use a dedicated token per environment (development, staging, production). Apify tokens are account-scoped — a compromised token exposes all actors, datasets, and key-value stores. Rotate tokens via the Apify Console and update the BizFirstAI credential record to propagate the change instantly across all workflows.

Key Apify Concepts

Actors

An Actor is a cloud program (Docker container) that performs a task: scraping a website, crawling sitemaps, extracting structured data, or any headless-browser automation. Identify an actor with its actorId, which takes the form username~actor-name (e.g. apify~web-scraper) or a raw actor ID string (e.g. moJRLRc85AitArpNN).

Actor Tasks

An Actor Task is a saved, named configuration for a specific actor. It bundles the actor ID with a preset input, memory, and timeout so workflows stay clean. Tasks are identified by actorTaskId (e.g. my-org~daily-product-scrape).

Runs

Each actor or task invocation creates a Run with a unique runId. A run transitions through statuses: READY → RUNNING → SUCCEEDED / FAILED / TIMED-OUT / ABORTED. Each run automatically gets a Default Dataset and a Default Key-Value Store for its output.

Datasets

A Dataset is an append-only list of JSON objects — the primary output of most scrapers. Fetch items via dataset/getItems using the defaultDatasetId returned by any run operation, or use the combined runAndGetDatasetItems operations to get data in one step.

Key-Value Stores

A Key-Value Store holds arbitrary records (JSON objects, text, images, HTML). Each run's default KV store contains at minimum the actor's INPUT and OUTPUT records. Retrieve any record by store ID and record key using key-value-store/getRecord.

Memory & Timeout

Apify bills by compute units (CUs = memory × time). Valid memory values in MB: 128, 256, 512, 1024 (default), 2048, 4096, 8192, 16384, 32768. The timeout field accepts 1–86400 seconds (24-hour maximum). Values outside valid memory set default to 1024 MB.

Common Workflow Patterns

Synchronous scrape: Use actor/runAndGetDatasetItems — one node, immediate results, no polling required.
Async fire-and-forget: Use actor/run without waiting, store the runId in a variable, then poll with actor-run/get inside a Loop + Delay until runStatus is SUCCEEDED.
Scheduled data pipeline: Use actor-task/runAndGetDatasetItems on a cron trigger — the task holds configuration, the workflow processes the data.
Paginated fetch: Chain multiple dataset/getItems nodes with incrementing offset values, or use a Loop node until itemCount is less than limit.
Output inspection: After any run, use key-value-store/getRecord with key OUTPUT to read structured output from the actor's default KV store.

actor/run →

Start an actor and get run metadata. Use waitForFinish to block until done, or fire-and-forget and poll with actor-run/get.

actor/runAndGetDatasetItems →

The most common scraping operation: run, wait, and return dataset items — all in one node.

actor-task/runAndGetDatasetItems →

Run a saved task and return dataset output. Best for scheduled pipelines with stable actor configurations.

dataset/getItems →

Fetch paginated items from any dataset by ID. Supports field projection and omission.

key-value-store/getRecord →

Retrieve a KV store record. Automatically handles JSON, text, and binary (base64) content types.