The new CLIP adversarial examples are partially from the use-mention distinction. CLIP was trained to predict which caption from a list matches an image. It makes sense that a picture of an apple with a large "iPod" label would be captioned with "iPod", not "Granny Smith"!
1
0
0
6
This can be somewhat fixed with a list of labels that are more explicit about this, at least for a small set of pictures I've tried. After some experimentation, I found this prompt that seems to work with CLIP ViT-B-32:

8:38 PM · Mar 7, 2021

1
3
0
19
Credits to @ykilcher for inspiration and @gwern for mentioning 'use-mention distinction' in the EleutherAI discord
🥳New Video (very short)🥳Turns out there is a SUPER EASY fix for countering textual adversarial attacks against @OpenAI's CLIP 😄 invidious.snopyta.org/Rk3MBx20z24
Show this thread
1
0
0
5
Also I wonder if this prompt is overfitting to "This is painting, text, symbol" Can you think of a use-mention example that isn't one of those?
1
0
0
3
Embarrassingly, this actually doesn't work for every adversarial example in the CLIP blogpost. My guess is the general technique will work for larger CLIPs and better prompts, though.
0
0
0
2