Website aesthetics play an important role in attracting users and customers as well as in enhancing user experience.
In this repository, we develop deep learning models that can evaluate the aesthetics of a website achieving high correlation with the human perception.
Paper: "Calista: A deep learning-based system for understanding and evaluating website aesthetics"
Cite as:
@article{DELITZAS2023,
title = {Calista: A deep learning-based system for understanding and evaluating website aesthetics},
journal = {International Journal of Human-Computer Studies},
volume = {175},
pages = {103019},
year = {2023},
issn = {1071-5819},
doi = {https://doi.org/10.1016/j.ijhcs.2023.103019},
url = {https://www.sciencedirect.com/science/article/pii/S1071581923000253},
author = {Alexandros Delitzas and Kyriakos C. Chatzidimitriou and Andreas L. Symeonidis}
}
For the training process, we rely on 2 different datasets.
-
Rating-based dataset: The users were asked to rate a webpage screenshot by providing an explicit numerical value on a scale
-
Comparison-based dataset: The users were asked to compare 2 different webpage screenshots at a time and choose which of the two is preferable
You can find more about these datasets here.
Models trained and evaluated using the Rating-based dataset.
Evaluation method: Training set (75.4%) - Test set (24.6%)
Results synopsis:
Rating-based Model | Pearson Correlation Coefficient | RMSE | Accuracy (2 classes) |
---|---|---|---|
Approach I | 0.78 [0.69, 0.85] | 0.616 | 88.78 % |
Approach II | 0.76 [0.66, 0.83] | 0.662 | 84.69 % |
Approach III | 0.78 [0.68, 0.85] | 0.628 | 83.67 % |
Description: In this approach, the model was trained using the mean value of the user ratings for each website. The model's output is an aesthetics score on the scale 1-9.
Transfer-learning: Flickr-Style was used as a base network
Description: In this approach, the model was trained using the distribution of the user ratings for each website expressed as an empirical probability mass function. The model's output is a predicted distribution of the aesthetics scores. The final score is calculated by the mean value of the predicted distribution (scale 1-9).
Transfer-learning: NIMA-MobileNet was used as a base network
Description: In this approach, the model was trained using all the pairs webpage - user rating. The model's output is an aesthetics score on the scale 1-9.
Transfer-learning: Flickr-Style was used as a base network
Models trained and evaluated using the Comparison-based dataset.
Evaluation method: Leave-one-out Cross Validation
Results synopsis:
Comparison-based Model | Pearson Correlation Coefficient | RMSE |
---|---|---|
Approach I | 0.70 [0.58, 0.79] | 1.353 |
Description: The dataset contains an aesthetics score for each website that was calculated using the Bradley-Terry model based on the pairwise comparisons collected by the users. In this approach, the model was trained using the Bradley-Terry ratings. The model's output is an aesthetics score on the scale 1-10.
Transfer-learning: Calista Rating-Based was used as a base network
-
Step 1: Download the datasets
git submodule update --init
-
Step 2: Download the pretrained models in the folder pretrained-models/. Links can be found here.
The code was tested using Python 3.6.