Using the blacklist and whitelist (ignore or include URLs)

Overview

You are able to define which subpages Ryte should check and which should be excluded from crawling. This is helpful if you only want to crawl a certain country-specific directory, for example.

You can include or exclude URLs within the project settings. Click on your project in the upper right corner and on the gear symbol in the drop-down menu. Now you are in the project settings.

Click on advanced analysis. There, the option Ignore/include URLs is listed:

Exclude URLs (blacklist):

You can exclude URLs from your crawl by adding blacklist rules, in this example we want to exclude the Magazine and our Wiki, we realize that by blacklisting the "subfolders". The rules should look like this:

This rule will exclude all URLs where the path starts with /wiki/ or /magazine/.

Include URLs (whitelist):

The whitelist has the same functionalities as the blacklist, but it works the opposite way. If you need to apply rules by "include only" then you can use our whitelist feature.

In this example, we want to crawl ONLY our Magazine and Wiki. We realize that by whitelisting each "subfolder":

This rule will include all URLs where the path starts with /wiki/ or /magazine/.

You can apply any rule in any depth if you wish certain sites to be excluded e.g.:

https://en.ryte.com/product-insights/whitelist-blacklist-feature

You can add as many rules as you like.

Important: If you want to crawl a certain subfolder only, you don't need to do this via include URLs. In this case, you can simply use the - Crawl subfolder - option in the project settings.