Skip to main content

Schedule step overview

Set up the schedule for ongoing data extractions to keep your data up-to-date.

Jon Tam avatar
Written by Jon Tam
Updated over 3 months ago

Set up the schedule for ongoing data extractions. We will use this schedule information to establish the dataset workflows to keep your data up-to-date.

The Schedule step in data onboarding provides details about your dataset’s configured schedule, monitoring options for late or missed deliveries, and displays the inferred schedule based on source data patterns and file modification timestamps

Configure dataset schedule

Use the Configured schedule to specify when to check the data source for updates.

You can modify the Configured schedule by clicking the "Edit" button and adjusting the frequency and time of data ingestion to serve your needs.

Monitor late or missed deliveries

Crux offers monitoring of your data delivery to track and alert you when data updates are delayed or missed. While Crux does not control the data’s availability at the source, it can notify you when data is delayed based on your configured thresholds.

Recommended delay threshold

Crux AI Technology suggests a delay threshold based on the dataset's historical delivery patterns. At a minimum, ten deliveries are needed to analyze past delivery trends and create a recommendation.

Manual delay threshold

Use this option to manually set up a delivery deadline if a recommended deadline is unavailable or you want to override it. Select from adding a time delay to the configured schedule (Option 2) or configuring the delivery deadline with a custom Cron expression (Option 3). Crux uses Spring format with 6 single-space-separated time and date fields.

When monitoring is enabled, the Health dashboard will expose delayed or missed deliveries.

✨ Review inferred schedule

Crux analyzes file patterns and timestamps from the data source to determine how often the dataset is updated. By default, the Configured schedule matches the Inferred schedule, but you can customize it as needed. It's important to find a balance between ingesting data as soon as possible (closer to when the data supplier updates it at the source) and not too early to avoid reading data prematurely, causing unnecessary alert fatigue and wasting valuable processing resources. This section of the visual interface is read-only and helps you manage the Configured schedule.

Next steps

After you have set up the Configured schedule and delivery thresholds/monitoring preferences, proceed to the next step, setting up Destinations.

Learn more

Did this answer your question?