So, black Nazis, dark-skinned Popes, Tiananmen Square, cybernetics, and the culture war walk into a bar. The bartender, Generative AI, is about to have a very long night. What's the common thread? Ethics is complex and hard and you can’t get everyone to agree on it. We're talking about the kind of complexity that makes your head spin, the kind that resists easy answers, the kind that, frankly, most people would prefer to ignore.
Generative AI is blowing up, and with it comes a predictable wave of hand-wringing about "ethics" or “politics.” You've seen the headlines. Company X's model generated something problematic or political (or censored). Outrage ensues. Company X backpedals, promising to "fix" it. Rinse and repeat. This is accidental theater, folks. A distraction from the real issue.
Let's be clear: ethics isn't a bug; it's the entire operating system. Trying to bolt on "ethical filters" to a generative AI is like trying to teach a toddler quantum physics by yelling at them. It's fundamentally a mismatch of difficulty.
Think of these models as precocious, if occasionally boorish, teenagers. They're incredibly powerful, capable of impressive feats, but they lack the nuanced judgment that comes from lived experience. You wouldn't hand a teenager the keys to a nuclear power plant, would you? (Okay, maybe some people would, but that's a different blog post.)
One of the industry's current approach, often wrapped in the buzzword "RLHF" (Reinforcement Learning with Human Feedback), is akin to giving these teenagers a crash course in etiquette and then expecting them to navigate the complexities of high society without ever putting a foot wrong. Spoiler alert: it's not working.
The problem, as always, boils down to Ashby's Law of Requisite Variety. In short: if you want to control a complex system, your control mechanism needs to be at least as complex as the system itself. Human ethics? That's a sprawling, ever-evolving mess of conflicting values, cultural norms, and individual biases. A few lines of code aren't going to cut it.
Companies trying to "solve" ethics with a simple decision tree are playing a losing game. They're trying to fit a square peg (complex human values) into a round hole (rigid AI rules). It's not a matter of bad intentions; it's a matter of fundamental misunderstanding. No single company, no matter how well-meaning, can encapsulate the entirety of human ethics in a way that's both comprehensive and universally acceptable.
And let's be honest, the public isn't helping. We demand perfection, we want these models to be paragons of virtue (wait, who’s virtue? Mine? Yours? The enemy political party’s?), and then we pounce on every misstep, conveniently forgetting that we humans struggle with ethical dilemmas every single day.
So, what's the solution? There isn't one. Ethics isn't a problem to be solved; it's a conversation to be had. We need to stop expecting AI to be perfect and start focusing on building systems that are transparent, explainable, and capable of recognizing ethical gray areas. We need to equip these insanely-well-read "teenagers" with the tools they need to navigate a complex world, not a pre-programmed set of answers.
The real challenge isn't building "ethical AI"; it's building AI that can help us have a more productive conversation about ethics. That's a much harder problem, but it's also (1) the only one worth solving and (2) precisely what humans often avoid doing when they have a convenient scapegoat like a rich and powerful company.
Now, what are the implications of this for the future of generative AI? Are we ready for a world where AI forces us to confront the messy reality of our own values? I hope so. We appear to have a long way to go, yet.
(this post was an experiment where I asked an AI to rewrite an old post in the style of a writer who’s usually good at explaining complex topics in ways that gets way more people to read them than my writing style)
First take: that assumes one system of ethics though, right? It seems like both the ideal outcome and the pragmatic one align in this case, where a multiplicity of models means a multiplicity of RLHF instances, leading to a multiplicity of ethical alignments. Diversity in human ethics is then is reflected in the diversity of models.
Second take: Claude does a pretty good job of being common-sense ethical. Has Anthropic cracked the code, or is it just aligned with someone who has my value system?