Machine Learning Classifiers

Training

Selecting the data source

The user can train a machine learning model on the data. The input for the ML model can be:

The data that has been processed with the tool during current session.
Upload a csv file with the data. The file should contain the same features as the training data.
A sample data with used for testing the tool.
A new testing set
A new training set

Train-test split

The data is split into training and testing sets. The user can specify the size of the training set by moving the slider. The default value is 0.8 (80% of the data is used for training).

Selecting number of folds

The user can select the number of folds for cross-validation. The default value is 5.

Selecting the classifiers

The user can select the classifiers to train the model on. The following classifiers are available:

Once the classifiers are selected the models are trained and results reports for each classifier is ourputted. The combined ROC curves for all the selected models is available as well.

Saving the model

(This feature is not available in the online app. Please use the local version of the app to use this feature).

The models can be saved internally. For that the user needs to select yes in the select box “Would you like to save this model?” in the training results box. Creating a new Experiment or using an existing one and giving a name to the specific run is needed to save the model properly.

Inference

A trained model can be loaded and used on unlabelled data to predict labels. The user can select the model to load from the dropdown of the previously saved models. The user can also select the data that has been processed with the tool during current session or a sample data with used for testing the tool. Once the model is loaded and the data is selected, the user can click the Predict button to make predictions.