Module llmsearch.utils.common_utils
Common utilties
Functions
def clone_monkey_patch(estimator, *, safe=True)-
Deprecated Monkey Patch function to clone the Estimator while doing cross validation/hyperparameter search
Usable in < 1.3 versions of scikit-learn versions
- This functions returns the same estimator, as there are no parameters to specifically "fit"
- This is done to avoid OOM errors for larger models, this does not affect the hyperparameter search in any way
Args
estimator:BaseEstimator- estimator to clone
safe:bool, optional- redundant parameter for now. Defaults to True.
Returns
BaseEstimator- returns the same estimator
def json_dump(ob, file_path)def json_load(file_path)def print_call_stack(n)-
Prints call stack from first call to latest call
Args
n:int- last n calls, last call would be the latest one
def process_dataset_with_map(dataset, sample_preprocessor, tokenizer, input_cols, eval_cols)-
Processes the given dataset by mapping a processing function over each sample.
Args
dataset:Dataset- A Hugging Face dataset to be processed.
sample_preprocessor:function- A function to preprocess sample inputs.
tokenizer:function- A tokenizer function to apply to input text.
input_cols:listofstr- Column names to be processed for input features.
eval_cols:listofstr- Column names for evaluation labels.
Returns
Dataset- A new dataset with original data and additional processed keys
_Xand_y.
def yaml_load(file_path)-
Load yaml file from file path
Args
file_path:str- path to yaml file
Returns
dict- loaded yaml file