2023.09.01.

A Discussion of ‘Adversarial Examples Are Not Bugs, They Are Features’: Discussion and Author Responses

A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Discussion and Author Responses


We want to thank all the commenters for the discussion and for spending time
designing experiments analyzing, replicating, and expanding upon our results.
These comments helped us further refine our understanding of adversarial
examples (e.g., by visualizing useful non-robust features or illustrating how
robust models are successful at downstream tasks), but also highlighted aspects
of our exposition that could be made more clear and explicit.

Our response is organized as follows: we first recap the key takeaways from
our paper, followed by some clarifications that this discussion brought to
light. We then address each comment individually, prefacing each longer response
with a quick summary.

We also recall some terminology from
our paper that features in our responses:

Datasets: Our experiments involve the following variants of the given
dataset DD (consists of sample-label pairs (xx, yy)) The
exact details for construction of the datasets can be found in our
paper, and
the datasets themselves can be downloaded at http://git.io/adv-datasets
:

  • D^Rwidehat{mathcal{D}}_{R}
  • D^NRwidehat{mathcal{D}}_{NR}
  • D^detwidehat{mathcal{D}}_{det}