Deep Learning Based Image Aesthetic Quality Assessment
The paper presents a deep‑learning approach that uses an ImageNet‑pretrained CNN to predict full human rating distributions for images via an Earth Mover’s Distance loss, trained on the AVA dataset, and demonstrates accurate assessment of aesthetic factors such as tone, contrast, and composition.
Background: Large volumes of user‑generated images in the DeWu community raise the need for an automatic method to evaluate image quality and aesthetics. Traditional image quality assessment focuses on low‑level degradations, while aesthetic assessment requires semantic‑level features.
The proposed method uses a convolutional neural network (CNN) to predict the distribution of human rating scores rather than a single scalar. The CNN backbone is initialized with ImageNet pretrained weights; the final layer is replaced by a 10‑neuron fully‑connected layer with softmax activation.
Model structure: The network consists of standard CNN components – convolutional layers, ReLU activation, pooling layers, and a final fully‑connected layer. Input images are resized to 256×256 and randomly cropped to 224×224 during training.
Loss function: To compare predicted and ground‑truth rating distributions, an Earth Mover’s Distance (EMD) loss is employed. The loss is defined as the Euclidean distance between the cumulative distribution functions (CDFs) of the predicted and true probabilities, optionally weighted by a factor γ=2 to penalize larger CDF gaps.
EMD implementation (Keras): def earth_movers_distance(y_true, y_pred): cdf_true = K.cumsum(y_true, axis=-1) cdf_pred = K.cumsum(y_pred, axis=-1) emd = K.sqrt(K.mean(K.square(cdf_true - cdf_pred), axis=-1)) return K.mean(emd)
Dataset: Experiments use the AVA dataset, containing 250,000 images from DPChallenge.com with aesthetic scores in the range [0,10] and multiple semantic/style tags.
Experiments & results: The model closely matches the ground‑truth rating distribution and demonstrates that factors such as tone, contrast, and composition significantly affect aesthetic scores. Sample predictions and a table of scores for clear vs. blurry images illustrate the model’s effectiveness.
References: Murray et al., AVA dataset (CVPR 2012); Ponomarenko et al., TID2013; Hou et al., Squared EMD loss (arXiv 2016); and related code repositories.
DeWu Technology
A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.