Skip to main content
All CollectionsTroubleshootingManaging Data Duplicates
How can I keep the duplicates in Cloud runs?
How can I keep the duplicates in Cloud runs?

Keep all the scraped data, not just the new data lines.

Updated over a week ago

If you run a task multiple times, you may see Octoparse showing duplicates on the Dashboard:

This is because Octoparse will store the data scraped from all the runs together and recognize duplicates. Duplicates will be deleted automatically from the Cloud.

Duplicates refer to data lines that are identical in all columns. While they may not be necessary in most cases, there are certain situations where keeping all the scraped data for comparison can be beneficial.

How to keep duplicates?

You can try to add the current date & time as a field in the task.

  • Go to the Data Preview

  • Click on the Add Custom Field button

  • Choose Current date & time

current_time.jpg

The field will be added like this:

mceclip1.png

The field indicates the date and time this data row is scraped. Since each row is scraped at a different time, they are now different in the current_time field. There won't be any duplicates.

Did this answer your question?