All Collections
Using Octoparse
Extract star rating information
Extract star rating information
Updated over a week ago

You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier, and more robust! Download and upgrade here if you haven't already done so!

Sometimes, we can't extract the rating information directly the same as scraping other text-format information, like page title. In the case below, the rating information is stored in the value of the "alt" attribute within the "img" element. In this tutorial, we will show you how to scrape this kind of star rating information from web pages.

There are two ways to fetch the star rating info.


1. Extract attributes from the source code

1. Select the rating area on the web page and choose Image URL from the Tips panel. You can also choose OuterHtml here. This step is only for creating a data field.

2. Click on the Extract Data action and click the "..." icon. Then choose the Customize field

3. Choose Select other attributes

4. The result will be displayed in the field


2. Extract and clean the HTML code

1. Select the rating area on the web page and select OuterHtml

2. Click Extract Data and click the "..." icon. Then choose Clean data.

3. After that, click Add Step and then choose Match with Regular Expression.

13.png

4. If you know how to use Regular Expression, you can enter the formula directly in the Regular Expression box. If you're not familiar with it, click "Not sure about RegEx? Try the RegEx tool!".

5. Click Start with and then input the part of the string that comes before the actual information we need. Next, click End with and then input the part of the string that goes after the actual information we need.

After that, click Match to see if the matched info is what we need. Then click Apply.

6. Go back to the settings and confirm it.

7. After all the settings, click Apply to save

Did this answer your question?