This function trains and evaluates a Random Forest classification model on cytokine data. It includes variable importance visualization, cross-validation for feature selection, and performance metrics such as accuracy, sensitivity, and specificity. For binary classification the function can also plot the ROC curve and compute the AUC.
Usage
cyt_rf(
data,
group_col,
ntree = 500,
mtry = 5,
train_fraction = 0.7,
plot_roc = FALSE,
k_folds = 5,
step = 0.5,
run_rfcv = TRUE,
verbose = FALSE,
seed = 123,
cv = FALSE,
cv_folds = 5,
scale = c("none", "log2", "log10", "zscore", "custom"),
custom_fn = NULL,
output_file = NULL,
font_settings = NULL,
progress = NULL
)Arguments
- data
A data frame containing the cytokine measurements. One column should correspond to the grouping variable (the outcome) and the remaining columns should be numeric predictors.
- group_col
A string naming the column in
datathat contains the grouping variable.- ntree
Integer specifying the number of trees to grow. Default is
500.- mtry
Integer specifying the number of variables randomly sampled at each split. Default is
5.- train_fraction
Numeric between 0 and 1 giving the proportion of data used for training. Default is
0.7.- plot_roc
Logical. If
TRUEand the problem is binary, an ROC curve and AUC are computed and returned. Default isFALSE.- k_folds
Integer specifying the number of folds for
rfcvwhenrun_rfcv = TRUE. Default is5.- step
Numeric specifying the fraction of variables removed at each step during
rfcv. Default is0.5.- run_rfcv
Logical indicating whether to run Random Forest cross-validation for feature selection. Default is
TRUE.- verbose
Logical. When
TRUE, training and test performance metrics, confusion matrices, and cross-validation details are printed. Default isFALSE.- seed
Optional integer seed for reproducibility. Default is
123.- cv
Logical indicating whether to perform a separate k-fold classification cross-validation using
caret. Default isFALSE.- cv_folds
Integer specifying the number of folds for classification cross-validation when
cv = TRUE. Default is5.- scale
Character string specifying a transformation to apply to numeric predictor columns prior to model fitting. One of
"none"(default),"log2","log10","zscore", or"custom".- custom_fn
A custom transformation function used when
scale = "custom".- output_file
Optional. A file path to save outputs as a PDF. If
NULL(default), a named list is returned for interactive display.- font_settings
Optional named list of font sizes for supported plot text elements.
- progress
Optional. A Shiny
Progressobject for reporting progress updates.
Value
When output_file is NULL, an invisible named list
with elements summary_text, vip_plot, roc_plot,
rfcv_plot, rfcv_data, importance_data, and
cv_results. When output_file is provided, a PDF is
written and the function returns NULL invisibly.
Examples
data.df0 <- ExampleData1
data.df <- data.frame(data.df0[, 1:3], log2(data.df0[, -c(1:3)]))
data.df <- data.df[, -c(2:3)]
data.df <- dplyr::filter(data.df, Group != "ND")
cyt_rf(data = data.df, group_col = "Group", k_folds = 5, ntree = 1000,
mtry = 4, run_rfcv = TRUE, plot_roc = TRUE, verbose = FALSE)
