Feature Selection

Feature Selection Algorithms

On the feature selection page the user can select the feature selection algorithms to run. User can select statistics based feature selection algorithms or machine learning based feature selection algorithms. The Run Feature Selection button applies the selected algorithms to identify a subset of features based on the default parameters of each algorithm. Once completed, the results of the algorithms can be examined by expanding the boxes with the algorithm name. Additionally, algorithm parameters can be edited within the algorithm expanded box directly or by unchecking the “Use Default Parameters” checkbox which will allow to specify custom parameters.

The following feature selection algorithms are available:

Statistic group:

Pearson Correlation: Pearson product-moment correlation coefficients. Link
False Discovery Rate: Use the Benjamini-Hochberg procedure to select the p-values for an estimated false discovery rate. Link
Joint Mutual Information: Use estimated mutual information for a discrete target variable. Link
Protein Marker Selection: uses multiview feature selection especially targeted for proteomics or multi-omics datasets. PyPI
Fold Change: Fold change was computed using the following formula to measure how much a quantity changes between an original and a subsequent measurement: Fold Change = (Mean of intensity values for experimental samples (Samples with Parkinson’s))/(Mean of intensity values for control samples (Healthy samples)) Reference
T-test: T-test was used to compare the means of two groups of samples. Link
ROTS: (only available if ROTS was calculated on the ROTS page) Reproducibility optimized test statistic for for ranking genes/proteins based on evidence for differential expression. Link

Machine Learning group:

Recursive Feature Elimination: Select features by recursively considering smaller and smaller set of features. Link
Decision Tree Classifier: Decision tree feature selection based on feature importance. Link
Random Forrest Classifier: Random forest feature selection based on feature importance. Link
LightGBM Classifier: LightGBM feature selection based on feature importance. Link
XGBoost Classifier: XGBoost feature selection based on feature importance. Link
AdaBoost Classifier: AdaBoost feature selection based on feature importance. Link
Support Vector Classifier: Support vector machine feature selection based on feature importance. Link

Once the feature selection algorithms are selected, the Run Feature Selection button applies the selected algorithms to identify a subset of features based on the default parameters of each algorithm. Once completed, the results of the algorithms can be examined by expanding the boxes with the algorithm name. Additionally, algorithm parameters can be edited within the algorithm expanded box directly or by unchecking the “Use Default Parameters” checkbox which will allow to specify custom parameters.

Results Intersection

After different feature selection algorithms are run, the results of the algorithms can be intersected to identify features that are common to all algorithms. In the Intersect box a subset of all the executed algorithms is displayed and the user can choose the results of which algorithms they want to intersect. The Intersect button applies the intersection to the results of the algorithms. The results of the intersection can be examined by expanding the Intersection Results box. The Intersection Results are displayed in a table with a column at the end which shows the number of algorithm that selected a particular feature. The user can also download the results of the intersection as a csv file by clicking the Download button.