google index site

Semalt Guide On Scraper Extension For Chrome

For any business to survive and ultimately grow, it is necessary to stay ahead of its competitors and various risks. Making decisions based on analytical data is a sure way to forget about these problems. Such data can be acquired through data scrapping. That's where easy scraper extension for Chrome comes in: it will not only facilitate the process of data harvesting but also make it possible to scrape on the go without complicated setups.

How to use Scraper

    1. The first thing you need to do is to install the extension, so head over to the chrome web store, search for "scraper" and click on add to Chrome.

    2. Navigate to the website that you intend to scrape data from, mark the entry that you are interested in by highlighting it. Right click on it and select "scrape similar" on the menu that pops up.

    3. Doing so will launch a separate scraper console window. Here, you will see a list of the scraped data.

    4. To save the content, click on "save to Google docs," this will automatically export the data to a Google spreadsheet.

Extended scraping

In case you are planning to scrape more data, you can use the advanced approach. Note, it will be much easier to work with the tool if you have some knowledge of HTML. Suppose you wanted to scrape data from a source that has an archive based on time series data. In such case, if you try the method described above, you would get the garbled data.

To solve this issue, you can make use of an HTML and XML query language known as XPath. What does it do? XPath recognizes data regarding the different elements contained in each selection. The following is a guide on how to go about it:

1. Go to the Scraper console, on the upper left you should notice an "XPath" button, click on it and proceed to assemble the initial table.

2. You need to write the XPath for the right element. The current XPath which includes the whole information will be displayed in a format like this "//div [3]/div [3]/div [2]/div". The <div> elements will be recognized in the HTML document by the computer.

3. To separate the recognized data, you have to use the Scraper columns. To do so, you need to look for the different types of information you have available. Depending on the data you are scraping you may have titles. These titles are present next to every set of data. They are accompanied by a tag, in this case, a <b> tag.

4. Using inspect element locate and add the <b> tag to your XPath. Now you can label this first column as the "title column" as it will list down the titles. Proceed to create different XPaths for each column that you need.

5. Click on the scrape and the extension will automatically harvest the data and organize it into the different columns you have set.