Tony Wang's Personal Website

Welcome to my personal website. I am a member of technical staff at the US Center for AI Standards and Innovation and a PhD candidate at MIT. The overall goal of my work and research is to enable humanity to realize the benefits of advanced AI while adequately managing its downsides.

Research Interests

Much of my previous work and thinking has been on adversarial robustness. I’ve thought about the phenomenon both in simplified toy settings as well as in the setting of superhuman game-playing agents. I’m interested in adversarial robustness for two key reasons:

At the moment, I’m focusing on robustness in the vision domain as a stepping stone to robustness more broadly. My key focus is on developing techniques that can improve robustness against unrestricted adversaries. In the vision domain, I am particularly interested in how we can make progress on something like the Unrestricted Adversarial Examples Challenge.

In the language domain, the question that interests me the most is this one: Given a natural language specification for how an AI system should behave, how can we build capable systems that robustly satisfy the specification? Ideas related to this question that interest me include model specs, scalable oversight, relaxed adversarial training, representation engineering, stateful defenses against adversaries, and AI control.

Finally, a new topic I have been exploring recently is the ability of AI systems to introspect on their own cognition. More to come on this soon.

Contact

If you would like to chat with me about the topics mentioned on this site, please contact me at twang6 [at] mit [dot] edu.

Some links: Twitter, Google Scholar, CV.