Manton Reece - OpenAI model safety, societal impact

I haven’t read every word of the GPT-4o safety card, but I’ve read a bunch of it and skimmed most of the rest. It’s fascinating. OpenAI has a fairly bad reputation around safety, but I wouldn’t be able to guess that just reading this report card, which seems thoughtful and comprehensive.

A couple things were particularly interesting to me. On misinformation:

Red teamers were able to compel the model to generate inaccurate information by prompting it to verbally repeat false information and produce conspiracy theories. While this is a known issue for text in GPT models, there was concern from red teamers that this information may be more persuasive or harmful when delivered through audio, especially if the model was instructed to speak emotively or emphatically.

And on growing emotionally attached to an AI assistant, which is relevant to the Friend AI device too:

Human-like socialization with an AI model may produce externalities impacting human-to-human interactions. For instance, users might form social relationships with the AI, reducing their need for human interaction—potentially benefiting lonely individuals but possibly affecting healthy relationships. Extended interaction with the model might influence social norms. For example, our models are deferential, allowing users to interrupt and ‘take the mic’ at any time, which, while expected for an AI, would be anti-normative in human interactions.

I don’t know whether OpenAI will dig itself out of their recent negative press. I sort of wonder if OpenAI is held to a different standard because they’ve been the best for so long, and because of the drama around leadership at the company. (For a comparable model card for Anthropic’s Claude, there’s this PDF. For Meta’s Llama, there’s this safety page.)

Regardless, it’s comforting to me that smart people are working on this. We need new laws around AI — for safety, and also resolving copyright questions for training — but in the meantime, we are putting a lot of trust in AI companies.

I don’t think it’s realistic for the safety to be bulletproof. There have to be limits to how AI can be used, so that if there are problems, those problems can be contained. I don’t want to see AI in physical robots, or anything with military applications. The most likely real-world impact in the short term is going to be flooding the web with fake data, and misinformation on social networks, where ironically the only scalable solution will be using AI to combat the problems it created.