Artificial Intelligence 8 min read

Deep Learning Based Image Aesthetic Quality Assessment

The paper presents a deep‑learning approach that uses an ImageNet‑pretrained CNN to predict full human rating distributions for images via an Earth Mover’s Distance loss, trained on the AVA dataset, and demonstrates accurate assessment of aesthetic factors such as tone, contrast, and composition.

DeWu Technology

Apr 30, 2021

Deep Learning Based Image Aesthetic Quality Assessment

Background: Large volumes of user‑generated images in the DeWu community raise the need for an automatic method to evaluate image quality and aesthetics. Traditional image quality assessment focuses on low‑level degradations, while aesthetic assessment requires semantic‑level features.

The proposed method uses a convolutional neural network (CNN) to predict the distribution of human rating scores rather than a single scalar. The CNN backbone is initialized with ImageNet pretrained weights; the final layer is replaced by a 10‑neuron fully‑connected layer with softmax activation.

Model structure: The network consists of standard CNN components – convolutional layers, ReLU activation, pooling layers, and a final fully‑connected layer. Input images are resized to 256×256 and randomly cropped to 224×224 during training.

Loss function: To compare predicted and ground‑truth rating distributions, an Earth Mover’s Distance (EMD) loss is employed. The loss is defined as the Euclidean distance between the cumulative distribution functions (CDFs) of the predicted and true probabilities, optionally weighted by a factor γ=2 to penalize larger CDF gaps.

EMD implementation (Keras):

def earth_movers_distance(y_true, y_pred):
    cdf_true = K.cumsum(y_true, axis=-1)
    cdf_pred = K.cumsum(y_pred, axis=-1)
    emd = K.sqrt(K.mean(K.square(cdf_true - cdf_pred), axis=-1))
    return K.mean(emd)

Dataset: Experiments use the AVA dataset, containing 250,000 images from DPChallenge.com with aesthetic scores in the range [0,10] and multiple semantic/style tags.

Experiments & results: The model closely matches the ground‑truth rating distribution and demonstrates that factors such as tone, contrast, and composition significantly affect aesthetic scores. Sample predictions and a table of scores for clear vs. blurry images illustrate the model’s effectiveness.

References: Murray et al., AVA dataset (CVPR 2012); Ponomarenko et al., TID2013; Hou et al., Squared EMD loss (arXiv 2016); and related code repositories.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

CNN Deep Learning quality assessment AVA dataset EMD loss image aesthetics

Written by

DeWu Technology

A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.