Constraining an LLM for More Values-Aligned Interactions
Experiments in Constraint Engineering
I’ve had a simple question stuck in my head for a while: Are there ways to use LLMs that support my values, even though the default use does not have those values in mind?
Influencing this question are Shannon Vallor’s concept of polypotency and the connected concept of stabilities from Don Ihde. These terms are near-synonyms and both point to the aspect of technology that allows it to have many different uses and outcomes even though the technology was designed only in specific ways1. These concepts force me to think about the polypotencies/stabilities present in LLMs, and if there is a possibility present that enables me to flourish.
Every day for the past few weeks, I’ve been working with LLMs to see if I could do just that: bend the technology toward some of my values. I wanted to make sure that my AI use didn’t diminish my curiosity, intellectual humility, and my intellectual integrity2. Even more, I wanted to see if I could grow in these values while using LLMs. After my initial experiments, I’m convinced that even though LLMs might not share my epistemic values, I can still alter their interactions in order to have quality interactions that improve my thinking and my writing. This is what I mean by principled use, learning to incorporate one’s values into their interaction with technology.
The default use of LLMs, the way that I see many non-technical people use them, is to simply ask it a question and then use the AI’s response. They might ask about how something works, or they might ask for an essay on a topic using some simple text as input. These default uses violate my values. In fact, they actively undermine them.
By copy-pasting AI work, I’m actively working against every one of my values3. Copy-pasting work doesn’t help me stay honest to myself about what I know, since it is presenting work as something I know. It also doesn’t keep me grounded in alternative perspectives or help me acknowledge the gaps of my own thinking. This is how LLMs flatten worlds of knowledge; by commoditizing outputs over the actual parts of the experience that lead to change and growth. With the default use being inadequate, we need to move toward principled use, then.
What I noticed first in my experiment is that I already had a skill that would prove critical in the pursuit of principled use of AI: metacognition. Metacognition is thinking about thinking and the understanding of the patterns behind that thinking. If you want to have better interactions with AI you have to be aware of what your interaction goal is, what are the pitfalls of the system, what are your pitfalls as a human-being, and be keenly aware if any of those things are drifting and how. LLMs, especially in extended interactions, have a tendency to drift. As an example, I was using Claude Desktop to think about an AI editing tool I’m developing, and as I was engaging with the system about how paragraph edits should work, the system suddenly, and inexplicably shifted to article-wide edits. Article-wide edits aren’t as important as the paragraph-by-paragraph feature that I was working on, yet the system drifted anyway. Instead, having awareness of what my goal was (to make sure I could understand how I wanted the system to work, and its implications) stopped the drift from being a problem. I simply re-prompted and carried on with the work.
As I applied what I was learning about my LLM interaction preferences, I settled on the term Constraint Engineering4 for how I was approaching this work. The goal of constraint engineering is to use system prompts and prompts more generally to constrain the kinds of behavior you don’t want while also describing the behavior you do want. Explicit in the idea of constraint engineering is metacognition, since to constrain an LLM requires constant evaluation of the interaction, and how the LLM interaction influences one’s thoughts. In my case I was engineering constraints for the system so that principled use was the default. You see examples of constraint engineering all the time in how people prompt; it’s usually in all caps. One of my favorite joke attempts at constraint engineering is “DO NOT HALLUCINATE”. But, prompts for constraint engineering don’t have to be in all caps, they can be instructions that guide behavior: “don’t provide me the answer, but engage in a dialog that helps me understand X topic better”. This is the system equivalent of Odysseus tying himself to the mast to hear the sirens, I admit, but it’s been an effective approach.
Here are some examples of how I constrain my interactions and the thought processes behind them:
“Critical engagement: push back on logical gaps, force precision in reasoning.”
I’d rather deal with an AI that seeks to poke holes in my argument than one that tries to present things as correct when they’re not. Even if the AI ends up highlighting a logical gap that doesn’t exist, it makes me dive deeper into understanding what I’m saying.
“Work WITH me through problems, DON’T generate solutions for me”
The goal in my interactions with AI is to see if I can improve my ability to think clearly and more accurately communicate these abstract ideas that I’m working with. When I have a problem that I present to the LLM, it’s not for it to solve my problem, it’s for me to solve my problem by having constructed questions, responses, etc. that give me more to think about.
”Act as a Socratic tutor”
This is more of the above. A Socratic tutor doesn’t so much give an answer but help the person being tutored by constructing questions around their current understanding. I got this idea from reading the book “Brave New Words” by Salman Khan, Founder of Khan Academy. The idea is not to be given answers as much as scaffolding out from current understanding to what you want to understand.
A type of feature in LLM systems that makes constraint engineering, and therefore principled use, more effective, is any feature which allows personal preferences across the system. With personal preferences5, I can constrain my interactions from the preferences level instead of per prompt. Effectively, this means that I handle the bulk of the constraint setting at the preferences level, by using the prompts above (and more), and then I only need to fine-tune my constraints per prompt.
The constraint engineering approach is still limited. I still run into problems where it seems like the LLM ignores my preferences and still offers me up an answer. This is more frustrating than a roadblock. It means I have to be on my toes cognitively to make sure I’m having the kind of interactions I want, even after trying to impose significant constraints at a higher level.
That being said, my initial forays into figuring out how to best constrain LLMs in ways that help me align to my values has been successful. I’m having many more interactions with the LLM where I am augmenting my capacity, and learning new skills as I attempt to do so. I think what this kind of technology needs is many people working with it to try and bend it to their values. It’s not going to be successful for everyone, and that’s the point. We need to see polypotencies, the stabilities, that can be expressed with AI technology and that only happens when we have the will to move the technology toward some other end.
Don Ihde’s “stabilities” is technically more about social-technical configurations and Shannon Vallor’s “polypotency” seems more about a technomoral perspective. But for how I’m using them in this article, they’re the same.
I considered using the terms epistemic humility and epistemic integrity, but this article was already ballooning conceptually, so I went with the more generic terms.
This isn’t to say that it’s universally bad, but rather that copy-pasting work isn’t a condition under which I flourish.
I don’t actually like Constraint Engineering as a term, I hope to one day find a better term but for right now it works well for what’s going on, especially as a parallel to “prompt engineering” or “context engineering”
All of the major LLM chatbots have this feature. I use Claude Desktop, however, and my terminology will be what I use in Claude Desktop


Such an important question; I wrote an article about it (yet don't know how we can implement a values-based AI system yet).
Made me think of this map of values and which are replicated by LLMs, https://share.google/images/alnSFWblqrzktTd3l
https://shav.dev/blog/cultural-bias