Skip to main content

How to fix incorrect data extraction

Updated over 2 months ago

Before You Begin

This tutorial applies to the latest Octoparse version. For optimal performance, upgrade now if you're using an older release.


The Problem: Misdirected Data

When running tasks (locally or in the cloud), you might encounter:

  • Data extracted to the wrong columns

  • Missing data fields

Root Cause:
Faulty XPath expressions that fail to consistently locate target elements across pages.

Example:

This is the data we expected:

2022-06-15_11-34-16.jpg

This is the actual output. Note that not all the highlighted data is being extracted correctly.

2022-06-15_11-47-03.jpg


The Solution: XPath Correction

Step 1: Write a Robust XPath

Learn XPath fundamentals with our guide:
🔗 What is XPath and How to Use It in Octoparse?

Step 2: Update the XPath in Your Task

  1. Click More (···) next to the problematic data field

    2022-06-15_11-41-02.jpg
  2. Select Customize XPath

    2022-06-15_11-42-10.jpg
  3. Replace the existing XPath with your new expression

  4. Click Save

Step 3: Validate with Test Run

Always test updated tasks with Preview before full execution.

Tips

✔ Use relative XPath (not absolute) for dynamic pages
✔ Bookmark our XPath cheatsheet for common scenarios

Did this answer your question?