A step-by-step monitoring tutorial that shows you how to monitor and ensure consistency in named datasets which aggregate data from multiple actors or tasks.
- Data to always be in the correct format.
- Alerts if items are duplicated.
- Notification when your scheduled run finishes successfully.
- Data visualization on a simple dashboard.
For this use case, we will imagine you want to scrape fresh jokes from two websites and store them in a single named dataset daily.
You created two tasks from Web Scraper (apify/web-scraper) and set them to save the results in the desired dataset. Next, you need to test (validate / verify) your data to make sure it fits your needs. To avoid creating separate software that will do this, you can use our monitoring suite.
Each of the above tasks handles a different website. After the tasks finish successfully, they call the monitoring actor using a webhook that handles the data aggregation.
The two extraction tasks are scheduled to run every day using the
@daily cron expression. They produce a new named dataset each day. The naming convention for the dataset is
Now, to the monitoring part. For this tutorial, let's skip the monitoring of the tasks and jump right to the the dataset.
If you haven't already, add the monitoring suite to your account.
If you have already added the task, under its Settings tab, give it a name. For example, monitoring-jokes.
We recommend prefixing your monitoring task names with monitoring- so you could identify them easier.
Next, we will configure the monitoring suite.
Under your task's Input tab, set the Mode dropdown to Create configuration.
Next, open the What you want to monitor section. Give the monitoring suite a name in the Monitoring suite name field, e.g. daily-jokes.
In the Type of target: dropdown, select Dataset, since you will be monitoring a shared dataset.
Target name patterns should be the name of your dataset, DAILY-JOKES. If you want to use a more strict pattern, you can use ^DAILY-JOKES.
Select the Notify me whenever actor/task does not succeed option to receive a report when a run finishes unsuccessfully.
Each of your monitoring suites must have a unique name.
Your configuration will look like this:
Now, let's ensure that your jokes are in the correct form. Each joke's dataset item should contain a
text. Both values should be strings.
Open the Validating by a schema section and select the Enable schema validation option.
In the Validation options field, create an object containing a
schemakey. As its value, set an object specifying the format of each of the properties you want to validate.
Make sure you set Validation frequency to something other than Per run because datasets don't have runs. You can use natural language cron expressions, so in this instance, you can set frequency to Every day at noon.
The monitoring suite uses the ow library for type validation. Make sure to import the library using
/* global ow */.
In the Check for duplicates section, select the Enable duplicate items check option.
Set the first Unique keys field to
title. Click Add and set the second field to
text. This will ensure that the titles and jokes are unique.
Just like with validation frequency, set the Check frequency to something other than Per run (check step 3 for tips).
You can configure data visualization in the Statistics dashboard section. To enable it, check the Enable dashboard option.