Skip to main content

Select data step overview

Select files and file patterns for your data product.

Jon Tam avatar
Written by Jon Tam
Updated over 3 months ago

In this step, you’ll select files and folders from your source directory to include in your data product. Crux allows you to define up to 25 filename patterns per data product. These patterns help the system recognize and process files automatically from your data source.

How to select data

  1. Browse the data source directory

    • Use the Browse Files section on the left to explore your data source directory. Navigate through the folders and select the relevant files or patterns to be included in your data product.

  2. Supercharge your data onboarding with AI

    • Click on the AI file pattern discovery icon next to any folder in the Browse Files section. Crux’s AI engine will instantly scan the entire remote data source directory to uncover all available file patterns.

    • Once discovered, these file patterns will appear in the File Patterns panel.

    • Check the boxes next to the patterns you want to include in your data product.

  3. Add Filename Patterns

    • You can also select files from the directory or manually enter a custom filename pattern on the right panel. For example, you could use patterns like annual_report_%Y-%m-%d.csv to match files based on their naming convention automatically.

    • Selected filename patterns will appear in the Selected filename patterns section.

  4. Select filename patterns

    • Once you’ve selected or defined your filename patterns, they will appear in the Selected filename patterns list. You can see details such as the folder path and the number of files matching each pattern.

    • To make adjustments, click the trash icon to remove a pattern or use the pencil icon to edit it.

Limits

  • You can include up to 25 filename patterns for each data product.

  • The total size of all selected file patterns must not exceed 50 GB.

  • Individual file sizes can be as large as 1.5 GB.

  • Follow the Guidelines for supported sources, formats, and limitations to learn more about supported file formats and select qualified datasets.

Supported File Formats

  • Crux supports Delimited Text, Avro (flat), and Parquet (flat) file formats.

  • Files in other formats will be ingested in Raw format.

Next steps

Once you’ve selected all your necessary file patterns, click Next: Model to proceed to the data modeling step, where you’ll define schemas and set up data profiling.

Learn more

Did this answer your question?