Using man-in-the-middle proxy to intercept requests in Puppeteer
Sometimes you may need to intercept (or maybe block) requests in headless Chrome / Puppeteer, but page.setRequestInterception()
is not 100% reliable when the request is started in a new window.
One possible way to intercept these requests is to use a man-in-the-middle (MITM) proxy, i.e. a proxy server that can intercept and modify HTTP requests, even those over HTTPS. In this example, we're going to use https://github.com/joeferner/node-http-mitm-proxy, since it has all the tools that we need.
First we set up the MITM proxy:
const { promisify } = require('util');
const { exec } = require('child_process');
const Proxy = require('http-mitm-proxy');
const Promise = require('bluebird');
const execPromise = promisify(exec);
const wait = (timeout) => new Promise((resolve) => setTimeout(resolve, timeout));
const setupProxy = async (port) => {
// Setup chromium certs directory
// WARNING: this only works in debian docker images
// modify it for any other use cases or local usage.
await execPromise('mkdir -p $HOME/.pki/nssdb');
await execPromise('certutil -d sql:$HOME/.pki/nssdb -N');
const proxy = Proxy();
proxy.use(Proxy.wildcard);
proxy.use(Proxy.gunzip);
return new Promise((resolve, reject) => {
proxy.listen({ port, silent: true }, (err) => {
if (err) return reject(err);
// Add CA certificate to chromium and return initialize proxy object
execPromise('certutil -d sql:$HOME/.pki/nssdb -A -t "C,," -n mitm-ca -i ./.http-mitm-proxy/certs/ca.pem')
.then(() => resolve(proxy))
.catch(reject);
});
});
};
Then we'll need a Docker image that has the certutil
utility. Here is an example of a Dockerfile that can create such an image and is based on the apify/actor-node-chrome image that contains Puppeteer.
Now we need to specify how the proxy shall handle the intercepted requests:
// Setup blocking of requests in proxy
const proxyPort = 8000;
const proxy = setupProxy(proxyPort);
proxy.onRequest((context, callback) => {
if (blockRequests) {
const request = context.clientToProxyRequest;
// Log out blocked requests
console.log('Blocked request:', request.headers.host, request.url);
// Close the connection with custom content
context.proxyToClientResponse.end('Blocked');
return;
}
return callback();
});
The final step is to let Puppeteer use the local proxy:
// Launch puppeteer with local proxy
const browser = await puppeteer.launch({
args: ['--no-sandbox', `--proxy-server=localhost:${proxyPort}`],
});
And we're done! By adjusting the blockRequests
variable, you can allow or block any request initiated through Puppeteer.
Here is a GitHub repository with a full example and all necessary files: https://github.com/apify/actor-example-proxy-intercept-request
If you have any questions, feel free to contact us in the chat.
Happy intercepting!