Module `llmsearch.utils.common_utils`

Common utilties

Functions

def clone_monkey_patch(estimator, *, safe=True)

Deprecated Monkey Patch function to clone the Estimator while doing cross validation/hyperparameter search

Usable in < 1.3 versions of scikit-learn versions

This functions returns the same estimator, as there are no parameters to specifically "fit"
This is done to avoid OOM errors for larger models, this does not affect the hyperparameter search in any way

def json_dump(ob, file_path)

def json_load(file_path)

def print_call_stack(n)

Prints call stack from first call to latest call

def process_dataset_with_map(dataset, sample_preprocessor, tokenizer, input_cols, eval_cols)

Processes the given dataset by mapping a processing function over each sample.

dataset : Dataset: A Hugging Face dataset to be processed.
sample_preprocessor : function: A function to preprocess sample inputs.
tokenizer : function: A tokenizer function to apply to input text.
input_cols : list of str: Column names to be processed for input features.
eval_cols : list of str: Column names for evaluation labels.

Dataset: A new dataset with original data and additional processed keys _X and _y.

def yaml_load(file_path)

Load yaml file from file path