Module llmsearch.utils.common_utils

Common utilties

Functions

def clone_monkey_patch(estimator, *, safe=True)

Deprecated Monkey Patch function to clone the Estimator while doing cross validation/hyperparameter search

Usable in < 1.3 versions of scikit-learn versions

  • This functions returns the same estimator, as there are no parameters to specifically "fit"
  • This is done to avoid OOM errors for larger models, this does not affect the hyperparameter search in any way

Args

estimator : BaseEstimator
estimator to clone
safe : bool, optional
redundant parameter for now. Defaults to True.

Returns

BaseEstimator
returns the same estimator
def json_dump(ob, file_path)
def json_load(file_path)
def print_call_stack(n)

Prints call stack from first call to latest call

Args

n : int
last n calls, last call would be the latest one
def process_dataset_with_map(dataset, sample_preprocessor, tokenizer, input_cols, eval_cols)

Processes the given dataset by mapping a processing function over each sample.

Args

dataset : Dataset
A Hugging Face dataset to be processed.
sample_preprocessor : function
A function to preprocess sample inputs.
tokenizer : function
A tokenizer function to apply to input text.
input_cols : list of str
Column names to be processed for input features.
eval_cols : list of str
Column names for evaluation labels.

Returns

Dataset
A new dataset with original data and additional processed keys _X and _y.
def yaml_load(file_path)

Load yaml file from file path

Args

file_path : str
path to yaml file

Returns

dict
loaded yaml file