We present a new research direction for superalignment, together with promising initial results: can we leverage the generalization properties of deep learning to control strong models with weak supervisors?
This website uses cookies. By continuing to use this site, you accept our use of cookies.