4 Comments

First take: that assumes one system of ethics though, right? It seems like both the ideal outcome and the pragmatic one align in this case, where a multiplicity of models means a multiplicity of RLHF instances, leading to a multiplicity of ethical alignments. Diversity in human ethics is then is reflected in the diversity of models.

Second take: Claude does a pretty good job of being common-sense ethical. Has Anthropic cracked the code, or is it just aligned with someone who has my value system?

Expand full comment

That multiplicity of ethics is exactly what I was saying we need to get to. That no single company’s ethics, embedded via RLHF for example, can ever be “complete“. The same way that a plural society contains many multiple clusters of ethical views, some subtly different and some extremely different, so two will there have to be multiple different ethical AI bundlings.

Any company trying to terraform its own ethics universally is, in a minor small sense, marginally totalitarian in that act.

Note that product feature like custom prompts can help allow for this multiplicity, but there are other guardrails put in these products that often override the custom prompts.

And that is just within your typical western liberal democracy. Imagine what the RLHF of models created in Qatar or China or Saudi Arabia would be like.

Expand full comment

We’re seeing that with Deepseek, right? Also, the reduced hardware requirements seem like they make that outcome the practical default. If models are open and can run on a laptop, individualized or organizational-level RLHF seems like only a few steps away

Expand full comment

I like that reality of "intelligence too cheap to meter" that everyone has in their pocket working for them is a great potential future.

History, however, shows that a new vast cheap commodity capability leads to aggregation on top of it via convenience/discovery. Google, Apple, Meta, for example.

Expand full comment