Skip to main content

Extracting data with DevTools

Continue learning how to extract data from a website using browser DevTools, CSS selectors, and JavaScript via the DevTools console.


In the previous parts of the DevTools tutorial, we were able to extract information about a single product from the Sales collection of the Warehouse store. If you missed the previous lessons, please go through them to understand the basic concepts. You don't need any of the code from there, though. We will start from scratch.

Find all product elements

First, we will use the querySelectorAll() function from the previous lessons to get a list of all the product elements.

Run this command in your Console:

const products = document.querySelectorAll('.product-item');
products.length;

The length property of products tells us how many products we have in the list. It says 24 and if you count the number of products on the page, you'll find that it's correct. Good, that means our CSS selector is working perfectly to get all the products.

Print all products

Looping over elements

Visit this tutorial if you need to refresh the concept of loops in programming.

Now, we will loop over each product and print their titles. We will use a so-called for..of loop to do it. It is a loop that iterates through all items of an array.

Run the following command in the Console. Some notes:

  • The a.product-item__title selector and the extraction code come from the previous lesson.
  • The console.log() function prints the results to the Console.
  • The trim() function makes sure there are no useless whitespace characters around our data.
for (const product of products) {
const titleElement = product.querySelector('a.product-item__title');
const title = titleElement.textContent.trim();
console.log(title);
}

Learn more about the for..of loop.

Print all products' text

Extracting more data

We will add the price extraction from the previous lesson to the loop. We will also save all the data to an array so that we can work with it. Run this in the Console:

The results.push() function takes its argument and pushes (adds) it to the results array. Learn more about it here.

const results = [];

for (const product of products) {
const titleElement = product.querySelector('a.product-item__title');
const title = titleElement.textContent.trim();

const priceElement = product.querySelector('span.price');
const price = priceElement.childNodes[2].nodeValue.trim();

results.push({ title, price });
}

After running the code, you'll see 24 printed to the Console. That's because the results array includes 24 products.

Now, run this command in the Console to print all the products:

console.log(results);

Print all products' data

You may notice that some prices include the word From, indicating that the price is not final. If you wanted to process this data further, you would want to remove this from the price and instead save this information to another field.

Summary

Let's recap the web scraping process. First, we used DevTools to find the element that holds data about a single product. Then, inside this parent element we found child elements that contained the data (title, price) we were looking for.

Second, we used the document.querySelector() function and its All variant to find the data programmatically, using their CSS selectors.

And third, we wrapped this data extraction logic in a loop to automatically find the data not only for a single product, but for all the products on the page. 🎉

Next up

And that's it! With a bit of trial and error, you will be able to extract data from any webpage that's loaded in your browser. This is a useful skill on its own. It will save you time copy-pasting stuff when you need data for a project.

More importantly though, it taught you the basics to start programming your own scrapers. In the next lessons, we will teach you how to create your own web data extraction script using JavaScript and Node.js.