Microsoft ImageBERT | Cross-modal Pretraining with Large-scale Image-Text Data | Synced
A recent paper published by Microsoft researchers proposes a new vision-language pretrained model for image-text joint embedding, ImageBERT, which which achieves SOTA performance on both the MSCOCO...
Source: Synced | AI Technology & Industry Review
A recent paper published by Microsoft researchers proposes a new vision-language pretrained model for image-text joint embedding, ImageBERT, which which achieves SOTA performance on both the MSCOCO and Flickr30k datasets.