AI Phone Agents are here! To get early accessJoin the community
Large Language Models (LLMs) are no longer experimental or limited to research labs.
They’re now embedded in products, powering customer service agents, writing assistants, data extraction systems, and even legal or financial workflows. But here’s what separates impressive from average: prompt engineering.
Despite what it looks like on the surface, prompting isn’t just about asking the model to “write me an email” or “summarise this PDF”. The difference between a prompt that works once and a prompt that works reliably across hundreds of users, edge cases, and languages — that’s engineering.
This guide will take you through the full stack of prompt engineering: from model configuration to tactical prompting methods, and real-world considerations for scalable AI agent deployment.
LLM runs on a simple principle: next-token prediction. The model takes in a sequence of words (tokens) and predicts the next one, again and again, until it hits a stop condition.
So, when you give it a prompt like:
“Summarise this article in three bullet points:…”
it’s not “thinking…” or “understanding” like we human do. It is just completing text based on patterns it has seen during training.
Prompt engineering is the process of designing that input text to guide the model toward a useful, accurate, and repeatable output. It’s part instruction manual, part design system, and part debugging tool.
Done right, prompting can:
Choosing the right model is just the first step. Before writing prompts, it’s important to adjust the model’s settings. On our platform, these five settings matter most—they shape how the AI understands, retrieves, and responds to your input:
Base Prompt (Bot Persona) Defines the personality and tone of the AI. |
Set a clear persona for more consistent and tailored responses. |
Restrictions Sets boundaries on the AI’s behavior and responses. |
Use to avoid certain topics or ensure safe, appropriate replies. |
Temperature Controls how creative or focused the AI is. |
Use 0 for predictable tasks (like summaries), 0.7–0.9 for more open-ended ones. |
Knowledgebase Nodes Connects the AI to specific sources of truth. |
Add relevant nodes so the AI can retrieve and reference trusted information. |
Previous Message Limits Controls how many past messages the AI can “remember.” |
Increase for better context, decrease for speed or short interactions. |
While not required, these additional settings can help you fine-tune responses even more, especially for advanced or creative use cases:
Top-K Limits choices to the top K most likely next words. |
Keep under 50 to reduce randomness and keep responses focused. |
Top-P (nucleus) Chooses from the smallest set of words that cover a probability threshold. |
Use 0.9–0.95 for more natural, creative responses. |
Max Tokens Limits the length of the AI’s response. |
Lower values help control output length or cost in workflows. |
⚠️ Note: If temperature
is set to 0
, both Top-K and Top-P are ignored. The AI always picks the most likely next word (greedy decoding).
Each task requires a slightly different prompting strategy. Here’s a breakdown of the most effective approaches used in production LLM systems.
Zero-shot means giving the model just the task — no examples, no context.
Example:
Classify this review as POSITIVE, NEGATIVE or NEUTRAL:
"The product arrived late, but the customer support team was helpful."
Output: NEUTRAL
Best when the task is familiar to the model, or latency is a concern.
By providing examples within the prompt, you teach the model a pattern — useful when the task is nuanced or less common.
Example: Pizza order parser
I want a small pizza with cheese, tomato sauce and olives."
{
"size": "small",
"type": "normal",
"ingredients": [["cheese", "tomato sauce", "olives"]]
}
Now parse:
"I want a large pizza, first half mushrooms and cheese, second half tomato and ham"
Output:
{
"size": "large",
"type": "half-half",
"ingredients": [["mushrooms", "cheese"], ["tomato", "ham"]]
}
Few-shot prompting is extremely effective for structured output like JSON, XML, or SQL.
These control the “voice” and perspective of the model.
Example:
You are an expert reviewer. Classify the sentiment and return in JSON format:
Review: "Heir is a beautifully haunting film. I couldn’t stop thinking about it."
JSON schema:
{
"movie_reviews": [
{
"sentiment": "POSITIVE",
"name": "Heir"
}
]
}
For tasks that need logic or reasoning, CoT forces the model to show its steps.
Without CoT:
I’m 20 now. When I was 3, my partner was 3x my age. How old is my partner now?
→ 63 (Wrong)
With CoT:
Let’s think step by step…
→ My partner was 9 when I was 3. That’s a 6-year difference. I’m 20 now, so they’re 26.
✅ Correct answer: 26
This technique prompts the LLM to first consider a more general question related to the specific task. The answer to this general question is then fed back into a subsequent prompt for the specific task.
This encourages the LLM to access relevant background knowledge and reasoning before tackling the problem, potentially leading to more accurate and insightful responses.
Example Flow:
This process can help improve accuracy and mitigate biases by focusing on underlying principles.
Both Self-Consistency and Tree of Thoughts (ToT) build on the idea of prompting models to reason step by step, but they take different approaches to improve accuracy and problem-solving.
Used when the model needs to both think and act — like planning a search, making API calls, or manipulating data.
Example task: “How many kids do the members of Metallica have?”
Writing effective prompts can be time-consuming. APE aims to automate this process. One approach involves using an LLM to generate multiple prompt candidates for a given task.
These candidates are then evaluated (e.g., using metrics like BLEU or ROUGE), and the best-performing prompt is selected. This iterative process can help discover effective prompts without extensive manual effort.
Example Flow (Generating chatbot order phrases):
Becoming proficient in prompt engineering requires experimentation and following some key guidelines:
json-repair
can help fix malformed JSON.Prompt engineering isn’t confined to labs or hackathons — it plays a critical role in real-world business operations. Whether it’s in customer support, internal tools, or sales automation, the quality of prompts directly shapes how an AI agent behaves.
Let’s say a user sends this message via WhatsApp:
“Need a refund. Got the wrong size.”
Now compare two approaches:
Handle the refund
of the user.
You are a support agent. Ask the user for the order ID. Confirm the reason for refund. Offer options for refund or exchange.
The second one provides structure, tone, and a flow — reducing back-and-forth, ensuring a smoother experience.
Prompt engineering also enhances the relevance of suggestions:
Recommend something
based on purchase: {{purchases
}}
Act as a shopping assistant for an online store. Based on user's last 3 purchases and wishlist, suggest 2 complementary products under $100.
----
last 3purchases
: “{{purchase}}”
wishlist items: “{{wishlist}”
You move from generic advice to contextual, value-based upselling.
In operational use cases like invoice parsing, accuracy is more important than originality. Prompting must be strict:
Extract invoice number, total amount, and due date from the uploaded document. Return output in valid JSON.
This clarity ensures the model doesn’t hallucinate or drift from format.
For sales agents prompt like this for lead qualification:
You are a SaaS product advisor. Ask the user about their company size and use case. If the company has over 50 employees, mark them as Enterprise.
This reduces manual hand-offs and keeps pipelines clean.
Across all these applications, one pattern holds true: vague prompts lead to vague outputs. Whether you’re a business owner, developer, marketer, or CX manager. prompt design is no longer optional — it’s a key operational skill.
Prompt engineering is how you turn AI from a demo into a dependable tool. It’s not about getting the model to respond — it’s about getting it to respond the right way, every time.
With well-structured prompts, you can:
Whether you’re working on internal tools or customer-facing agents, clear prompting is the difference between AI that works and AI that breaks.
It’s how you ship faster, reduce friction, and stay in control — without writing a single line of code.