Generating fingerprints
Learn how to use two super handy npm libraries to generate fingerprints and inject them into a Playwright or Puppeteer page.
In Crawlee, you can use FingerprintOptions on a crawler to automatically generate fingerprints.
import { PlaywrightCrawler } from 'crawlee';
const crawler = new PlaywrightCrawler({
browserPoolOptions: {
fingerprintOptions: {
fingerprintGeneratorOptions: {
browsers: [{ name: 'firefox', minVersion: 80 }],
devices: ['desktop'],
operatingSystems: ['windows'],
},
},
},
});
Note that Crawlee will automatically generate fingerprints for you with no configuration necessary, but the option to configure them yourself is still there within browserPoolOptions.
Using the fingerprint-generator package
Crawlee uses the Fingerprint generator npm package to do its fingerprint generating magic. For maximum control outside of Crawlee, you can install it on its own. With this package, you can generate browser fingerprints.
It is crucial to generate fingerprints for the specific browser and operating system being used to trick the protections successfully. For example, if you are trying to overcome protection locally with Firefox on a macOS system, you should generate fingerprints for Firefox and macOS to achieve the best results.
import { FingerprintGenerator } from 'fingerprint-generator';
// Instantiate the fingerprint generator with
// configuration options
const fingerprintGenerator = new FingerprintGenerator({
browsers: [
{ name: 'firefox', minVersion: 80 },
],
devices: [
'desktop',
],
operatingSystems: [
'windows',
],
});
// Grab a fingerprint from the fingerprint generator
const generated = fingerprintGenerator.getFingerprint({
locales: ['en-US', 'en'],
});
Injecting fingerprints
Once you've manually generated a fingerprint using the Fingerprint generator package, it can be injected into the browser using fingerprint-injector. This tool allows you to inject fingerprints into browsers automated by Playwright or Puppeteer:
import FingerprintGenerator from 'fingerprint-generator';
import { FingerprintInjector } from 'fingerprint-injector';
import { chromium } from 'playwright';
// Instantiate a fingerprint injector
const fingerprintInjector = new FingerprintInjector();
// Launch a browser in Playwright
const browser = await chromium.launch();
// Instantiate the fingerprint generator with
// configuration options
const fingerprintGenerator = new FingerprintGenerator({
browsers: [
{ name: 'firefox', minVersion: 80 },
],
devices: [
'desktop',
],
operatingSystems: [
'windows',
],
});
// Grab a fingerprint
const generated = fingerprintGenerator.getFingerprint({
locales: ['en-US', 'en'],
});
// Create a new browser context, plugging in
// some values from the fingerprint
const context = await browser.newContext({
userAgent: generated.fingerprint.userAgent,
locale: generated.fingerprint.navigator.language,
});
// Attach the fingerprint to the newly created
// browser context
await fingerprintInjector.attachFingerprintToPlaywright(context, generated);
// Create a new page and go to Google
const page = await context.newPage();
await page.goto('https://google.com');
Note that Crawlee automatically applies wide variety of fingerprints by default, so it is not required to do this unless you aren't using Crawlee or if you need a super specific custom fingerprint to scrape with.
Generating headers
Headers are also used by websites to fingerprint users (or bots), so it might sometimes be necessary to generate some user-like headers to mitigate anti-scraping protections. Similarly with fingerprints, Crawlee automatically generates headers for you, but you can have full control by using the browser-headers-generator package.
import BrowserHeadersGenerator from 'browser-headers-generator';
const browserHeadersGenerator = new BrowserHeadersGenerator({
operatingSystems: ['windows'],
browsers: ['chrome'],
});
await browserHeadersGenerator.initialize();
const randomBrowserHeaders = await browserHeadersGenerator.getRandomizedHeaders();
Wrap up
That's it for the Mitigation course for now, but be on the lookout for future lessons! We release lessons as we write them, and will be updating the Academy frequently, so be sure to check back every once in a while for new content!