Experimental data is sometimes subject to artifacts such as contamination and overflow readings. Such events can lead to data outliers. Since these points can skew the fitted model significantly, it is good practice to exclude them from a dataset. However, if too many points are unjustifiably excluded, the fitted model will not capture the true shape of the data. Hence, outlier identification has to be approached thoughtfully.
More detailed information on subset data selections can be found here.
In Synthace, you can exclude points / select subsets of data via three mechanisms: plot selection, table selection, or filtering.
How to select a subset of the dataset
First, navigate to the Select & Transform tab.
Select a dataset subset using the plots:
Three plots are available in the Select & Transform tab: Run Sequence, Histogram and Normal Plot. They are linked, meaning that any selection is displayed in all three plots simultaneously. The selection will also be mirrored in the table underneath the plots.
In any of the plots, click and drag the mouse cursor over the points which you want to include in the dataset.
If you want to include more points, hold down the Shift key and select more points by clicking and dragging the mouse cursor.
To deselect all points, click and drag the mouse cursor in an empty area of any of the plots.
Note: with this approach, you can only add more points to the selection or completely deselect all points, but not deselect specific points.
Select a dataset subset using the table:
As above, any selection made in the table will be mirrored in the plots as well.
By default, all points are selected in the table, meaning that they are all included in the dataset.
From the checkboxes in the leftmost column of the table, deselect the points that you want to exclude.
Similarly, you can re-include points in the selection by clicking the corresponding checkboxes.
Note: Clicking the checkbox column header in the table will select / deselect all points.
Select a dataset subset using a filter:
Underneath the plots, click the + Filter button.
From the drop-down (Select factor for filter), select the quantity that you want to use for filtering
Note: For a categorical or discrete quantity, a selection box with the levels of that quantity will appear. For a continuous quantity, a range (minima and maxima of that quantity) will appear.
For a categorical / discrete quantity, remove, one by one, the levels that you want to exclude by clicking the X next to the level. If you want to re-add a level, click inside the selection box, scroll down to find the desired level, and click to add it.
For a continuous quantity, you can exclude points by setting a larger minimum bound or a smaller maximum bound. You can re-include points by adjusting the bounds.
To remove a filter, click on the orange X button corresponding to the filter.
Note: You can add multiple filters by clicking the + Filter button. In this case, the selection will be the result of the intersection of all filtering conditions.
Note: The points selected / deselected via the filtering mechanism will be reflected in the plots and in the table. However, additional interaction with the plots and table will leave the filters as-is. Therefore, it is advisable that, for selecting a subset of the data, you choose either the filters, or the plots and / or the table.
β
Invert a selection:
In order to invert a selection, click the Invert button above the table. This will be mirrored in the plots and table, but not in the filters, in the case when filters are used.
How to save a subset of the dataset
You can save the selection you make using a specific name. On the left side of the page, under Row Selection, the name of the subset selection will have changed to include the suffix (edited).
To save the selection, click on the Save As button underneath the Row Selection dropdown menu.
Type in a name for the subset. Click OK.
From the dropdown menu under Row Selection, you can select to view the different subsets you have previously created.
Note: If you want to dismiss the selection changes you made to a dataset, click Cancel under the Row Selection dropdown menu.
Alternatively:
You can first copy a dataset selection by clicking Copy under the Row Selection dropdown menu.
Type in a new name and click OK.
Proceed with the selections you want to make, using the mechanisms described above.
Click the Save button under the Row Selection dropdown menu.
βNote: You will be prompted if you want to overwrite the Row Selection. You can accept by clicking OK.
How to delete a dataset selection:
From the dropdown menu under Row Selection, select the dataset of interest.
Click the Delete button under the dropdown menu.
βNote: You will be prompted to confirm the deletion - click OK if you want to proceed.
How to rename a dataset selection:
From the dropdown menu under Row Selection, select the dataset of interest.
Click the Rename button under the dropdown menu.
Type in a new name and click OK.
Troubleshooting
When you try to delete a subset dataset that has been used in creating a model in the Create Models tab, a pop-up will prevent you from deleting it. If you want to proceed, you first have to navigate to the Create Models tab and delete the model that uses the subset selection. Then, back on the Select & Transform tab, you can delete the subset dataset.
Well done for making it to the end of this tutorial.
To learn how to apply a predefined transformation to your data, click here.
To learn how to apply predefined column based calculations to your data, click here.
To learn how to apply custom column based calculations to your data, click here.
To learn how to explore your models and make predictions, click here.