Nov 28

8 min read

Population Based Training in Ray Tune

Hyperparameter tuning is a key step in model selection. Hyperparameters are like settings, if you do not handle them appropriately, it can have a bad impact on the results of the model. Tuning can be done manually or automatically. In today’s world, because of computational capabilities, a high number of hyperparameters, a big variety of algorithms, and helper libraries like the Ray, the preferred way is automatically tuning hyperparameters.

In this article, we’ll talk about Population Based Training, explore Ray Tune, and see an example of hyperparameter tuning. GitHub Repo: https://github.com/lukakap/pbt-tune.git

What PBT means

As we have already mentioned, the good performance of the model is related to the correct selection of hyperparameters. Population Based Training is one of the charming ways of hyperparameters choosing. It consists of two parts: random search and clever choosing. In the random search step, the algorithm chooses several combinations of hyperparameters randomly. There is a high chance that most of the combinations will have low-performance scores and a small portion of combinations on the contrary will have better / good performance scores. Here comes clever choosing. The second step is in a cycle until we achieve the desired result or we do not exhaust the number of iterations. The clever choosing step contains two main methods: exploit and explore. Exploit — replace the combination of the hyperparameters with more promising ones, based on the performance metric. Explore — randomly perturb the hyperparameters (in most cases it is multiplied by some factor) to add noise.

Population Based Training allows doing two meaningful things together: parallelize training of hyperparameters combinations, study from the rest of the population and get promising results promptly.

Talk about Ray Tune

Ray Tune is a Ray-based python library for hyperparameter tuning with the latest algorithms such as PBT. We will work on Ray version 2.1.0. Changes can be seen in the release notes below. We will also mention important changes in the way.

Releases · ray-project/ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a…

github.com

Before moving on to practical examples let’s go over some basic concepts. Trainable — is an objective that helps algorithms to evaluate configurations. It can have Class API or Function API, but according to the ray tune documentation, Function API is recommended. Search Spaces — values ranges for hyperparameters. Trials — Tuner will generate several configurations and run processes on them, so the process runed on a configuration is called Trial. Search Algorithm — suggests hyperparameter configurations, by default Tune uses random search as the search algorithm. Scheduler — Based on reported results during training process, schedulers decide whether stop or continue. Next meaningful concept is checkpointing. Checkpointing means saving intermediate results, necessary to resume and then continue training.

In most cases Search Algorithm and Scheduler can be used together in tuning process, but there is exception. One of the cases, when they are not used together is Population Based Training. In Ray Tune docs PBT is in schedulers part, but it is both at the same time. It’s a scheduler because it stops trials based on the results and is a searcher as it has the logic to create a new configuration.

We use well known The Boston Housing Dataset (https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html). We can import this dataset from sklearn.

One meaningful change in Ray Tune was the execution API. tune.run() has changed into Tuner().fit. Before the update, we were passing the parameters separately, but in the new version, config classes were introduced which simplifies a lot of things. First of all, grouped related parameters together which makes execution code easier to read and understand. And second, when you use Ray Tune in a project, some configurations are the same for some algorithms, so you can make one common config class object and move around algorithms, which makes life easier.

imports

# imports
import json
import os

from joblib import dump, load
from lightgbm import LGBMRegressor
from ray import tune
from ray.air.config import CheckpointConfig
from ray.air.config import RunConfig
from ray.tune.schedulers import PopulationBasedTraining
from ray.tune.tune_config import TuneConfig
from ray.tune.tuner import Tuner
from sklearn.datasets import load_boston
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import train_test_split

Implementation of Trainable

Let’s start with trainable. As we already mentioned, there are two trainable APIs: function-based and class-based. We will write trainable with Class API.

class TrainableForPBT(tune.Trainable):
    def setup(self, config):
        pass
    
    def step(self):
        pass

tune.Trainable is base class for class-based trainables. We need to override at least two methods: setup and step. Setup is invoked when training starts and step on every iteration. Unlike setup function, step can be invoked several times.

In the setup we need to have x_train and y_train to estimate the efficiency of the trial model in future steps. Of course, setup is the parent class’s (tune.Trainable) function, but it gives us the possibility to add additional arguments. Also, we need to initialize the lgbm regressor/model in the setup. We are going to retrain our model on every iteration, but on the first one we want to just fit the model, hence need to count on which iteration are we in. Nothing more at this point.

def setup(self, config, x_train, y_train):
    self.current_config = config
    self.x_train = x_train
    self.y_train = y_train
    # need to add about model
    self.model = LGBMRegressor(**self.current_config)
    # count on which iteration are we in
    self.current_iteration = 0
    self.current_step_score = None

What do we do in step? We should estimate current configurations efficiency and return score. We will use cross-validation with r2 and return this score. Therefore, PBT will know scores associated with all configurations after each iterations and it will make decisions about perturbation based on these scores. Also we should refit model one more time if it is not first iteration.

def step(self):
    self.current_iteration += 1
    if self.current_iteration == 1:
        self.model.fit(self.x_train, self.y_train)
    else:
        self.model.fit(self.x_train, self.y_train, init_model=self.model)

    self.current_step_score = cross_val_score(estimator=self.model, X=self.x_train, y=self.y_train,
                                              scoring='r2', cv=5).mean()
    results_dict = {"r2_score": self.current_step_score}
    return results_dict

After two main function overrideing PBT needs more functions to override. For the exploition process we need to save and read checkpoint.

Start with save_checkpoint. We will use joblib library for saving and restoring model. What do we need to save? First of all — model, since we always need previous iteration model (init_model) for next iteration, we can also save current iteration number and current step score.

def save_checkpoint(self, tmp_checkpoint_dir):
    path = os.path.join(tmp_checkpoint_dir, "checkpoint")
    with open(path, "w") as f:
        f.write(json.dumps(
            {"current_score": self.current_step_score, "current_step": self.current_iteration}))

    path_for_model = os.path.join(tmp_checkpoint_dir, 'model.joblib')
    dump(self.model, path_for_model)

    return tmp_checkpoint_dir

We should restore same things from load_checkpoint.

def load_checkpoint(self, tmp_checkpoint_dir):
    with open(os.path.join(tmp_checkpoint_dir, "checkpoint")) as f:
        state = json.loads(f.read())
        self.current_step_score = state["current_score"]
        self.current_iteration = state["current_step"]

    path_for_model = os.path.join(tmp_checkpoint_dir, 'model.joblib')
    self.model = load(path_for_model)

Above trainable class can be considered as completed. But we can improve the training time with the reuse_actor feature.

In the training process, we have as many Trainables as configuration samples. Each Trainable needs several seconds to start. With the reuse_actor feature, we can reuse already started Trainable for new multiple configurations/hyperparameters. So we will need less Trainable and the time spent on the start-up will be less as well.

Let’s implement reset_config, which delivers new hyperparameters. In reset_config every variable need to be adjusted to new hyperparameters, it’s like new setup. There is one tricky question, every time different configurations swap the same Trainable, do they start the process from scratch, due to the fact that in reset_config we write it like the start? Actually, no, because after reset_config, the Trainable calls load checkpoint if one exists, hence, training will continue from the last stop/checkpoint.

def reset_config(self, new_config):
    self.current_config = new_config
    self.current_iteration = 0
    self.current_step_score = None
    self.model = LGBMRegressor(**self.current_config)
    return True

We have completed the implementation of Trainable. The finished class will look like this

class TrainableForPBT(tune.Trainable):
    def setup(self, config, x_train, y_train):
        self.current_config = config
        self.x_train = x_train
        self.y_train = y_train
        # need to add about model
        self.model = LGBMRegressor(**self.current_config)
        # count on which iteration are we in
        self.current_iteration = 0
        self.current_step_score = None

    def step(self):
        self.current_iteration += 1
        if self.current_iteration == 1:
            self.model.fit(self.x_train, self.y_train)
        else:
            self.model.fit(self.x_train, self.y_train, init_model=self.model)

        self.current_step_score = cross_val_score(estimator=self.model, X=self.x_train, y=self.y_train,
                                                  scoring='r2', cv=5).mean()
        results_dict = {"r2_score": self.current_step_score}
        return results_dict

    def save_checkpoint(self, tmp_checkpoint_dir):
        path = os.path.join(tmp_checkpoint_dir, "checkpoint")
        with open(path, "w") as f:
            f.write(json.dumps(
                {"current_score": self.current_step_score, "current_step": self.current_iteration}))

        path_for_model = os.path.join(tmp_checkpoint_dir, 'model.joblib')
        dump(self.model, path_for_model)

        return tmp_checkpoint_dir

    def load_checkpoint(self, tmp_checkpoint_dir):
        with open(os.path.join(tmp_checkpoint_dir, "checkpoint")) as f:
            state = json.loads(f.read())
            self.current_step_score = state["current_score"]
            self.current_iteration = state["current_step"]

        path_for_model = os.path.join(tmp_checkpoint_dir, 'model.joblib')
        self.model = load(path_for_model)

    def reset_config(self, new_config):
        self.current_config = new_config
        self.current_iteration = 0
        self.current_step_score = None
        self.model = LGBMRegressor(**self.current_config)
        return True

Execute Experiment

Now we can create some configurations and run Tune experiment. Tuner has 4 parameter: trainable, param_space, tune_config and run_config. Trainable is already implemented. Let’s define param_space.

Param_space is same as already mentioned search space. First, we need to define a list of parameters that we are going to tune. To simplify, choose 3 parameters: learning_rate, num_leaves, max_depth.

Tune has own Search Space API, so we should use them when we define search spaces. The name of search spaces are intuitive, so let’s see the result without further ado.

param_space = {
    "params": {
        "learning_rate": tune.loguniform(1e-5, 1e-1), #between 0.00001 and 0.1
        "num_leaves":  tune.randint(5, 100), #between 5 and 100(exclusive)
        "max_depth": tune.randint(1, 9), #between 1 and 9(exclusive)
    },
}

Next thing to define is tune_config. But before we implement that, we need to create scheduler — Population Based Training object.

The first parameter of Population Base Training schedule is time_attr. It is the training result attribute for comparison, which should be something that increases monotonically. We choose trainig_iteration as a time attribute, so when we mention time_attr anywhere, that means training_iteration. perturbation_interval — how often should the perturbation occur. If we do perturbation often, then we need to save the checkpoints often as well. Hereby, let’s choose perturbation_interval to be 4. burn_in_period — perturbation will not happen before this number of intervals (iteration) has passed. It will not be true if we clone the state of the top performers to poorly performing models from the very beginning, as performance scores are unstable at early stages. So give 10 iterations of burn_in_period trials and then start perturbation. hyperparam_mutations is a dict of hyperparameters and their search spaces, which can be perturbated. We want to perturbate all hyperparameters from the param_space dict, so hyperparam_mutations will be the same as param_space[“params”]. We will not pass mode and metric arguments in PopulationBasedTraining, as we define them in TuneConfig.

pbt = PopulationBasedTraining(
            time_attr="training_iteration",
            perturbation_interval=4,
            burn_in_period=10,
            hyperparam_mutations=param_space["params"],
        )

In TuneConfig we need to pass metric, which is the name of reported score from trainable — “r2_score” in our case. Also mode, which has two value min or max, depends on objective minimizing or maximizing the metric. As already mentioned, we don’t have a search algorithm and our scheduler algorithm is pbt (PopulationBasedTraining). reuse_actors should be true as well. num_samples — number of samples of hyperparameters from search space should tune.

tune_config = TuneConfig(metric="r2_score", mode="max", search_alg=None, scheduler=pbt, num_samples=15, reuse_actors=True)

Next to define is RunConfig, which contains CheckpointConfig inside, so first create CheckpointConfig and then RunConfig. In CheckpointConfig, checkpoint_score_attribute and checkpoint_score_order are the same as metric and mode in TuneConfig. Choose checkpoint_frequency same as perturbation_interval. Also, save the last checkpoint at the end of training with checkpoint_at_end=True.

checkpoint_config = CheckpointConfig(checkpoint_score_attribute="r2_score", 
                                     checkpoint_score_order="max", 
                                     checkpoint_frequency=4, 
                                     checkpoint_at_end=True)

In Run Config we can pass the name of experment and local_dir, which is directory where training results are saved. It will be useful if we would like to restore/continue the experiment in the future. We should add easy criteria for stopping — stop after 30 iterations.

run_config = RunConfig(name="pbt_experiment", 
                       local_dir='/Users/admin/Desktop/Dressler/Publications',
                       stop={"training_iteration": 30},
                       checkpoint_config=checkpoint_config)

It’s time to create Tuner. Because we have trainable with extra arguments, we need to use and pass tune.with_parameters inside Tuner.

X, y = load_boston(return_X_y=True)
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

trainable_with_parameters = tune.with_parameters(TrainableForPBT, x_train=x_train, y_train=y_train)

tuner = Tuner(trainable_with_parameters, param_space=param_space["params"], tune_config=tune_config, run_config=run_config)
analysis = tuner.fit()

Get Results from Training

Now, we can interact with results using ResultGrid object (analysis). Using get_best_result we can get best result from the all trials. Also I will show you some useful results from ResultGrid.

best_trial_id = analysis._experiment_analysis.best_trial.trial_id
best_result = analysis.get_best_result()
best_result_score = best_result.metrics['r2_score']
best_config = best_result.config
best_checkpoint = best_result.checkpoint
best_checkpoint_dir = best_result.checkpoint.as_directory()
print(f"BEST TRIAL ID: {best_trial_id}")
print(f"BEST RESULT SCORE: {best_result_score}")
print(f"BEST CHECKPOINT DIRECTORY: {best_checkpoint}")

Population Based Training in Ray Tune

Releases · ray-project/ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a…

More from Luka Kapanadze

Recommended from Medium

VOiCES at Speech Odyssey 2020 — part II: Advances in Speaker Verification

What Is Machine Learning

ConvNets with Fast.ai

A brief introduction to the Boosted Tree Classifier of TensorFlow

Machine Learning — Word Embedding & Sentiment Classification using Keras

Activity Classification with TensorFlow

Questions on NLP

Machine Learning -Random Forest

Get the Medium app

Luka Kapanadze