With the Stellar benchmarks, we wish to track and report the ongoing progress in the emerging field of personalized human-centric generation. To this end, we investigate how well a personalization method works on our purposed metrics. A personalization method accepts as input an image of a human subject ( S* ) and a text description (a prompt) to place them in some imaginary context.
The evaluation happens on three dimensions:
S* watering a palm tree at a snowy dayis the expected "object" (snow) present in the generated output rather than ignored subject to the personalization constraint.
S* playing basketballIn such cases, we would want to evaluate whether S* is interacting with the object, "basketball".
To evaluate your method on our benchmark you would first need to download and process Stellar- and Stellar- following the instructions from our Official Repository. Steps to evaluate your method:
We also fine-tune ELITE and StellarNet on CelebAMask-HQ where we exclude from the training set the image portion of Stellar- and Stellar-. We indicate any method that satisfy the Cross-Val Training constraint with ✅ in the table below. Stellar- and Stellar- prompts must be excluded from the pretraining of any method. Any method that includes in the training stage any photos that are also included in the evaluation set of Stellar- or Stellar- are denoted with ❌. Textual Inversion and Dreambooth require to be fine-tuned per subject identity and as such they can only be evaluated under that constraint.
If you exclude in the training of your method the images fro Stellar- and Stellar-, please indicate it when making a submission.
Paper | IPS | APS | SIS | GOA | RFS | Cross-Val Training |
---|---|---|---|---|---|---|
StellarNet | 0.637 | 0.693 | 0.577 | 0.305 | 0.134 | ✅ |
ELITE | 0.383 | 0.490 | 0.355 | 0.260 | 0.106 | ✅ |
Dreambooth | 0.252 | 0.317 | 0.232 | 0.302 | 0.103 | ❌ |
Textual Inversion | 0.287 | 0.510 | 0.262 | 0.229 | 0.082 | ❌ |
Paper | IPS | APS | SIS | Cross-Val Training |
---|---|---|---|---|
StellarNet | 0.622 | 0.685 | 0.564 | ✅ |
ELITE | 0.368 | 0.449 | 0.342 | ✅ |
Dreambooth | 0.246 | 0.299 | 0.228 | ❌ |
Textual Inversion | 0.299 | 0.419 | 0.273 | ❌ |
To report new results on Stellar- or Stellar- please send the performance numbers and the accompanying paper link to Alexandros Benetatos.
If you find our work useful in your research, please consider citing:
@article{stellar2023,
author = {Achlioptas, Panos and Benetatos, Alexandros and Fostiropoulos, Iordanis and Skourtis, Dimitris},
title = {Stellar: Systematic Evaluation of Human-Centric Personalized Text-to-Image Methods},
volume = {abs/2312.06116},
journal = {Computing Research Repository (CoRR)},
year = {2023},
}