Tony Wang's Personal Website
Home
Projects and Writing
About
Projects and Writing
A selection of works I am proud of. These were done with many wonderful collaborators.
2024-09-01 — 2025-10-24
Learning to interpret weight differences in language models
2023-10-12 — 2024-07-01
Covert malicious finetuning
2023-07-20 — 2024-06-20
Can Go AIs be adversarially robust?
2023-06-08 — 2023-12-15
Takeaways from a mechinterp project
2021-09-15 — 2023-07-20
Adversarial policies beat superhuman Go AIs
2021-01-10 — 2021-07-02
Taylor expansions: An easy derivation
2015-06-01 — 2015-12-23
Codeforces round #336