Projects and Writing

Tony Wang's Personal Website

Projects and Writing

A selection of works I am proud of. These were done with many wonderful collaborators.

2024-09-01 — 2025-10-24 Learning to interpret weight differences in language models
2023-10-12 — 2024-07-01 Covert malicious finetuning
2023-07-20 — 2024-06-20 Can Go AIs be adversarially robust?
2023-06-08 — 2023-12-15 Takeaways from a mechinterp project
2021-09-15 — 2023-07-20 Adversarial policies beat superhuman Go AIs
2021-01-10 — 2021-07-02 Taylor expansions: An easy derivation
2015-06-01 — 2015-12-23 Codeforces round #336