Module llmsearch.utils.common_utils
Common utilties
Functions
def clone_monkey_patch(estimator, *, safe=True)
-
Deprecated Monkey Patch function to clone the Estimator while doing cross validation/hyperparameter search
Usable in < 1.3 versions of scikit-learn versions
- This functions returns the same estimator, as there are no parameters to specifically "fit"
- This is done to avoid OOM errors for larger models, this does not affect the hyperparameter search in any way
Args
estimator
:BaseEstimator
- estimator to clone
safe
:bool
, optional- redundant parameter for now. Defaults to True.
Returns
BaseEstimator
- returns the same estimator
def json_dump(ob, file_path)
def json_load(file_path)
def print_call_stack(n)
-
Prints call stack from first call to latest call
Args
n
:int
- last n calls, last call would be the latest one
def process_dataset_with_map(dataset, sample_preprocessor, tokenizer, input_cols, eval_cols)
-
Processes the given dataset by mapping a processing function over each sample.
Args
dataset
:Dataset
- A Hugging Face dataset to be processed.
sample_preprocessor
:function
- A function to preprocess sample inputs.
tokenizer
:function
- A tokenizer function to apply to input text.
input_cols
:list
ofstr
- Column names to be processed for input features.
eval_cols
:list
ofstr
- Column names for evaluation labels.
Returns
Dataset
- A new dataset with original data and additional processed keys
_X
and_y
.
def yaml_load(file_path)
-
Load yaml file from file path
Args
file_path
:str
- path to yaml file
Returns
dict
- loaded yaml file