How Tuun’s Automated Data Augmentation Boosts AI Model Accuracy
The article explains how Tuun, an open‑source Bayesian‑optimization tool, automatically searches data‑augmentation policies for machine‑learning models, details the setup with Microsoft NNI, provides code and configuration examples, and presents experiments on CIFAR‑10/100 and SVHN showing that Tuun‑generated policies match or surpass expert‑tuned strategies and further improve performance when combined.
Why Data Augmentation Matters
Machine‑learning and deep‑learning models improve in accuracy and generalisation when trained on more data, but obtaining sufficient labelled data is often limited by privacy, security, or high annotation cost. Data augmentation synthesises new training examples from existing data to alleviate this problem.
Automatic Search with Tuun
Tuun, an open‑source tool in the CASL ecosystem, formulates the search for an augmentation policy as a hyper‑parameter optimisation (HPO) problem and solves it with Bayesian optimisation. The workflow consists of defining a search space that describes a policy as a set of operations, probabilities and magnitudes, letting Tuun propose trials, training the model with each policy, and feeding the validation accuracy back to Tuun.
Installation and Setup
git clone [email protected]:petuum/tuun.git && cd tuun
python -m pip install -r requirements/requirements_dev.txt
python tuun/probo/models/stan/compile_models.py -m gp_distmat_fixedsig
python -m pip install --upgrade nniAfter installation, a JSON file (search_space.json) defines the augmentation search space. An excerpt of the file is shown below.
{
"operation1_1":{"_type":"choice","_value":["shearX","shearY","translateX","translateY","rotate","color","posterize","solarize","contrast","sharpness","brightness","autocontrast","equalize","invert","randomCrop","randomHorizontalFlip"]},
"prob1_1":{"_type":"uniform","_value":[0,1]},
...
"magnitude5_2":{"_type":"uniform","_value":[0,1]}
}Policy Implementation
The policy class reads the parameters supplied by Tuun and builds a list of SubPolicy objects. Each sub‑policy contains two operations; during training one sub‑policy is randomly selected for each mini‑batch and each operation is applied with its associated probability.
class Policy(object):
def __init__(self, params, fillcolor=(128,128,128), image_size=32):
self.policies = []
for i in range(1, 6):
self.policies.append(
SubPolicy(
p1=params[f'prob{i}_1'],
operation1=params[f'operation{i}_1'],
magnitude1=params[f'magnitude{i}_1'],
p2=params[f'prob{i}_2'],
operation2=params[f'operation{i}_2'],
magnitude2=params[f'magnitude{i}_2'],
fillcolor=fillcolor))
def __call__(self, img):
policy_idx = random.randint(0, len(self.policies)-1)
img = self.policies[policy_idx](img)
return self.resize(img)Running Experiments with NNI
A minimal NNI configuration points to the Tuun tuner and specifies the search space file.
tuner:
codeDir: {Your_Tuun_Code_Parent_Directory}/tuun/tuun
classFileName: nni_tuner.py
className: TuunTuner
classArgs:
optimize_mode: maximize
tuun_config:
seed: 1
model_config: {'name': 'standistmatgp'}
acqfunction_config: {'name': 'default', 'acq_str': 'ei', 'n_gen': 500}
acqoptimizer_config: {'n_init_rs': 5, 'jitter': True}
probo_config: {'normalize_real': True}
gpuIndices: '1'
trial:
command: python3 main.py --extra 10 --dataset svhn
codeDir: .
gpuNum: 1Running nnictl create --config nni_config.yml launches 60 trials, each training a ResNet‑18‑based model for 50 epochs on CIFAR‑10, CIFAR‑100 or SVHN.
Results
The table (originally an image) compares four settings: no augmentation, expert‑tuned policy (ETP), the best policy found by Tuun, and Tuun combined with ETP. The “No Aug” baseline always yields the lowest test accuracy. Tuun’s policies achieve accuracy comparable to ETP and, when added to ETP, further improve performance. On datasets where ETP is not available (e.g., SVHN), Tuun still discovers policies that outperform the non‑augmented baseline.
Takeaways
Data augmentation is an effective way to boost model performance.
Tuun can automatically discover augmentation policies that rival manually crafted expert solutions.
Combining Tuun‑generated policies with existing expert policies can yield additional gains.
The approach works across multiple image classification benchmarks without hand‑tuning.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Code DAO
We deliver AI algorithm tutorials and the latest news, curated by a team of researchers from Peking University, Shanghai Jiao Tong University, Central South University, and leading AI companies such as Huawei, Kuaishou, and SenseTime. Join us in the AI alchemy—making life better!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
