SIMON

Allows users to configure machine learning models by selecting predictors, responses, model types, and preprocessing options.

Overview

The Predictive - Start tab in SIMON offers a simple way to set up and run predictive models.

In the analysis properties section, the user sets up their predictive model based on the dataset and its associated features. Setup options are provied below:

Classification / Regression / Time Series: Choose the type of analysis you want to perform. Only available options are displayed based on your dataset and selected variables.
Predictor Variables: Select the independent variables (predictors) for the model. Enable the switch to select all columns, or specify individual columns by typing their names.
Response: Define the dependent variable (response) that the model will predict or classify.
Exclude Predictors: Specify any predictor variables that should be excluded from the analysis.
Training/Testing Dataset Partition (%): Adjust the partition between training and testing datasets using a slider (default: 75%). This enables you to set the ratio for model validation.
Additional Exploration Classes: Add exploratory variables that are not used in the model training but are available for analysis.
Preprocessing: Apply preprocessing methods such as centering or scaling to standardize data before training the model. You can select multiple options from the dropdown menu.
Reset Features & Selection: Clears all selected features, models, and settings, allowing you to start with a fresh configuration.

Example Workflow

Choose your predictor and response variables, configure preprocessing, and set the training/testing split.

Select Predictor variables: In this case, all predictor variables are selected, and the "exclude predictors" is used to remove non-contributing features from the analysis, such as arbitrary sample ids.

Select Response Variables: Select the desired response variable. In a classification model, this is the outcome the model is trying to predict based on the predictor features.

Set Training/Testing Dataset partition: Use the slider to adjust the partition between training and testing. In this case, we will keep the standard 75% partition, though it may make sense to vary the partition based on the machine learning package used.

Configure Preprocessing: Select preprocessing methods to appropriately standardize data based on your dataset and the machine learning packages you plan to use.

PreviousPredictive Nextmulset - multiset intersection

Last updated 2 months ago

Was this helpful?