Module `llmsearch.utils.mem_utils`

Inspired from toma, Memory related utils to do friendly inference.

Functions

def batch_without_oom_error(func)

Perform Inference on a batch of samples by dividing the batch_size by 2 each time whenever OOM error happens, Function should have a batch_size and disable_batch_size_cache parameter

Args

func : Callable: function having this signature of arguments *args, batch_size, disable_batch_size_cache, **kwargs

def gc_cuda()

Gargage collect RAM & Torch (CUDA) memory.

def get_gpu_information()

Get CUDA gpu related info if gpu exist

Returns

Union[None, Tuple[int, float, float]]: total available gpus, total occupied memory gb, total available gpu memeory

None if unable to get CUDA GPU related info

def get_total_available_ram()

Get total available ram in GB

Returns

float: available ram in GB

def get_traceback(ignore_first=0, stack_context=5)

Get traceback from first to latest call

Args

ignore_first : int, optional: ignore first n traceback. Defaults to 0.
stack_context : int, optional: context for traceback. Defaults to 5.

Returns

Tuple[Tuple]: Tuples of Function call and code

def is_cuda_out_of_memory(exception)

Checks for CUDA OOM Error

def is_cudnn_snafu(exception)

For/because of https://github.com/pytorch/pytorch/issues/4107

def is_out_of_cpu_memory(exception)

Checks for CPU OOM Error

def should_reduce_batch_size(exception)

Checks whether batch size can be reduced or not

Classes

class Cache

Cache to store the optimal batch size for a specific configuration call using traceback and memory information

Initializes cache

Methods

def empty_cache(self)

Empties cache

def get_value(self, current_value, stacktrace, total_available_gpu_memory, total_available_ram_memory)

Tries to get a value for a particular set of hashes, returns current_value (this only happens during the initial call)

Args

current_value : Any: current value
stacktrace : Tuple: stacktrace of the method call
total_available_gpu_memory : float: total available gpu memory
total_available_ram_memory : float: total available ram memory

Returns

float: current_value if hash_key is not present, else returns the hashed key

def is_empty(self)

Checks if the cache is empty

Returns

bool: empty or not

def set_value(self, value, stacktrace, total_available_gpu_memory, total_available_ram_memory)

Sets a value based on certain combination of different hashes

Args

value : Any: hash value
stacktrace : Tuple: stacktrace of the method call
total_available_gpu_memory : float: total available gpu memory
total_available_ram_memory : float: total available ram memory