How Tuun’s Automated Data Augmentation Boosts AI Model Accuracy

The article explains how Tuun, an open‑source Bayesian‑optimization tool, automatically searches data‑augmentation policies for machine‑learning models, details the setup with Microsoft NNI, provides code and configuration examples, and presents experiments on CIFAR‑10/100 and SVHN showing that Tuun‑generated policies match or surpass expert‑tuned strategies and further improve performance when combined.

Code DAO
Code DAO
Code DAO
How Tuun’s Automated Data Augmentation Boosts AI Model Accuracy

Why Data Augmentation Matters

Machine‑learning and deep‑learning models improve in accuracy and generalisation when trained on more data, but obtaining sufficient labelled data is often limited by privacy, security, or high annotation cost. Data augmentation synthesises new training examples from existing data to alleviate this problem.

Automatic Search with Tuun

Tuun, an open‑source tool in the CASL ecosystem, formulates the search for an augmentation policy as a hyper‑parameter optimisation (HPO) problem and solves it with Bayesian optimisation. The workflow consists of defining a search space that describes a policy as a set of operations, probabilities and magnitudes, letting Tuun propose trials, training the model with each policy, and feeding the validation accuracy back to Tuun.

Installation and Setup

git clone [email protected]:petuum/tuun.git && cd tuun
python -m pip install -r requirements/requirements_dev.txt
python tuun/probo/models/stan/compile_models.py -m gp_distmat_fixedsig
python -m pip install --upgrade nni

After installation, a JSON file (search_space.json) defines the augmentation search space. An excerpt of the file is shown below.

{
    "operation1_1":{"_type":"choice","_value":["shearX","shearY","translateX","translateY","rotate","color","posterize","solarize","contrast","sharpness","brightness","autocontrast","equalize","invert","randomCrop","randomHorizontalFlip"]},
    "prob1_1":{"_type":"uniform","_value":[0,1]},
    ...
    "magnitude5_2":{"_type":"uniform","_value":[0,1]}
}

Policy Implementation

The policy class reads the parameters supplied by Tuun and builds a list of SubPolicy objects. Each sub‑policy contains two operations; during training one sub‑policy is randomly selected for each mini‑batch and each operation is applied with its associated probability.

class Policy(object):
    def __init__(self, params, fillcolor=(128,128,128), image_size=32):
        self.policies = []
        for i in range(1, 6):
            self.policies.append(
                SubPolicy(
                    p1=params[f'prob{i}_1'],
                    operation1=params[f'operation{i}_1'],
                    magnitude1=params[f'magnitude{i}_1'],
                    p2=params[f'prob{i}_2'],
                    operation2=params[f'operation{i}_2'],
                    magnitude2=params[f'magnitude{i}_2'],
                    fillcolor=fillcolor))
    def __call__(self, img):
        policy_idx = random.randint(0, len(self.policies)-1)
        img = self.policies[policy_idx](img)
        return self.resize(img)

Running Experiments with NNI

A minimal NNI configuration points to the Tuun tuner and specifies the search space file.

tuner:
  codeDir: {Your_Tuun_Code_Parent_Directory}/tuun/tuun
  classFileName: nni_tuner.py
  className: TuunTuner
  classArgs:
    optimize_mode: maximize
    tuun_config:
      seed: 1
      model_config: {'name': 'standistmatgp'}
      acqfunction_config: {'name': 'default', 'acq_str': 'ei', 'n_gen': 500}
      acqoptimizer_config: {'n_init_rs': 5, 'jitter': True}
      probo_config: {'normalize_real': True}
  gpuIndices: '1'
trial:
  command: python3 main.py --extra 10 --dataset svhn
  codeDir: .
  gpuNum: 1

Running nnictl create --config nni_config.yml launches 60 trials, each training a ResNet‑18‑based model for 50 epochs on CIFAR‑10, CIFAR‑100 or SVHN.

Results

The table (originally an image) compares four settings: no augmentation, expert‑tuned policy (ETP), the best policy found by Tuun, and Tuun combined with ETP. The “No Aug” baseline always yields the lowest test accuracy. Tuun’s policies achieve accuracy comparable to ETP and, when added to ETP, further improve performance. On datasets where ETP is not available (e.g., SVHN), Tuun still discovers policies that outperform the non‑augmented baseline.

Takeaways

Data augmentation is an effective way to boost model performance.

Tuun can automatically discover augmentation policies that rival manually crafted expert solutions.

Combining Tuun‑generated policies with existing expert policies can yield additional gains.

The approach works across multiple image classification benchmarks without hand‑tuning.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Image Classificationdata augmentationAutoMLBayesian OptimizationNNITuun
Code DAO
Written by

Code DAO

We deliver AI algorithm tutorials and the latest news, curated by a team of researchers from Peking University, Shanghai Jiao Tong University, Central South University, and leading AI companies such as Huawei, Kuaishou, and SenseTime. Join us in the AI alchemy—making life better!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.