AutoscaledPool
Manages a pool of asynchronous resource-intensive tasks that are executed in parallel. The pool only starts new tasks if there is enough free CPU and memory available and the Javascript event loop is not blocked.
The information about the CPU and memory usage is obtained by the Snapshotter
class, which makes regular snapshots of system
resources that may be either local or from the Apify cloud infrastructure in case the process is running on the Apify platform. Meaningful data
gathered from these snapshots is provided to AutoscaledPool
by the SystemStatus
class.
Before running the pool, you need to implement the following three functions:
AutoscaledPoolOptions.runTaskFunction()
,
AutoscaledPoolOptions.isTaskReadyFunction()
and
AutoscaledPoolOptions.isFinishedFunction()
.
The auto-scaled pool is started by calling the AutoscaledPool.run()
function. The pool periodically queries the
AutoscaledPoolOptions.isTaskReadyFunction()
function for more tasks, managing optimal
concurrency, until the function resolves to false
. The pool then queries the
AutoscaledPoolOptions.isFinishedFunction()
. If it resolves to true
, the run finishes
after all running tasks complete. If it resolves to false
, it assumes there will be more tasks available later and keeps periodically querying for
tasks. If any of the tasks throws then the AutoscaledPool.run()
function rejects the promise with an error.
The pool evaluates whether it should start a new task every time one of the tasks finishes and also in the interval set by the
options.maybeRunIntervalSecs
parameter.
Example usage:
const pool = new Apify.AutoscaledPool({
maxConcurrency: 50,
runTaskFunction: async () => {
// Run some resource-intensive asynchronous operation here.
},
isTaskReadyFunction: async () => {
// Tell the pool whether more tasks are ready to be processed.
// Return true or false
},
isFinishedFunction: async () => {
// Tell the pool whether it should finish
// or wait for more tasks to become available.
// Return true or false
},
});
await pool.run();
new AutoscaledPool(options)
Parameters:
options
:AutoscaledPoolOptions
- AllAutoscaledPool
configuration options.
autoscaledPool.log
autoscaledPool.minConcurrency
Gets the minimum number of tasks running in parallel.
Returns:
number
autoscaledPool.minConcurrency
Sets the minimum number of tasks running in parallel.
WARNING: If you set this value too high with respect to the available system memory and CPU, your code might run extremely slow or crash. If you're not sure, just keep the default value and the concurrency will scale up automatically.
Parameters:
value
:number
autoscaledPool.maxConcurrency
Gets the maximum number of tasks running in parallel.
Returns:
number
autoscaledPool.maxConcurrency
Sets the maximum number of tasks running in parallel.
Parameters:
value
:number
autoscaledPool.desiredConcurrency
Gets the desired concurrency for the pool, which is an estimated number of parallel tasks that the system can currently support.
Returns:
number
autoscaledPool.desiredConcurrency
Sets the desired concurrency for the pool, i.e. the number of tasks that should be running in parallel if there's large enough supply of tasks.
Parameters:
value
:number
autoscaledPool.currentConcurrency
Gets the the number of parallel tasks currently running in the pool.
Returns:
number
autoscaledPool.run()
Runs the auto-scaled pool. Returns a promise that gets resolved or rejected once all the tasks are finished or one of them fails.
Returns:
Promise<void>
autoscaledPool.abort()
Aborts the run of the auto-scaled pool and destroys it. The promise returned from the AutoscaledPool.run()
function
will immediately resolve, no more new tasks will be spawned and all running tasks will be left in their current state.
Due to the nature of the tasks, auto-scaled pool cannot reliably guarantee abortion of all the running tasks, therefore, no abortion is attempted and
some of the tasks may finish, while others may not. Essentially, auto-scaled pool doesn't care about their state after the invocation of .abort()
,
but that does not mean that some parts of their asynchronous chains of commands will not execute.
Returns:
Promise<void>
autoscaledPool.pause([timeoutSecs])
Prevents the auto-scaled pool from starting new tasks, but allows the running ones to finish (unlike abort, which terminates them). Used together with
AutoscaledPool.resume()
The function's promise will resolve once all running tasks have completed and the pool is effectively idle. If the timeoutSecs
argument is provided,
the promise will reject with a timeout error after the timeoutSecs
seconds.
The promise returned from the AutoscaledPool.run()
function will not resolve when .pause()
is invoked (unlike abort,
which resolves it).
Parameters:
[timeoutSecs]
:number
Returns:
Promise<void>
autoscaledPool.resume()
Resumes the operation of the autoscaled-pool by allowing more tasks to be run. Used together with
AutoscaledPool.pause()
Tasks will automatically start running again in options.maybeRunIntervalSecs
.