Avatar

Gary Constable AKA GhostFrog

Builder of AI Agents, Data Pipelines & Automation Systems

Lesson 1: LLMs Are Not Functions (And Why That Changes Everything)

2025-12-14

I started this project assuming an LLM call worked like a normal function:

input → output

Same input, same output. Simple.

That assumption lasted about half a day.

What followed was my first real AI engineering lesson — not about prompts, but about control.


The mistake most people make

Most people (including me, initially) treat LLMs like this:

  • write a prompt
  • call the API
  • parse the text
  • hope it behaves

That works… until it doesn’t.

LLMs are not deterministic functions. They are probabilistic generators. If you don’t explicitly constrain them, they will:

  • change answers
  • change structure
  • emit multiple outputs
  • break contracts
  • fail in ways that are hard to reproduce

I hit all of those in Lesson 1.


LLMs are not functions

A real function looks like this:

f(x) → y

An LLM call is closer to:

f(x, params, runtime_state) → {y₁, y₂, y₃, ...}

The model produces a distribution of possible continuations.
Your job as an engineer is to decide which ones you will accept.

Until you do that, you don’t have a system — you have a demo.


Temperature is not creativity — it is risk

The first control knob I learned to respect was temperature.

  • temperature = 0
    → greedy decoding
    → pick the single most likely token every time

  • higher temperature
    → more exploration
    → more variation
    → more structural deviation

At high temperature, I didn’t just get different answers — I got:

  • extra text after JSON
  • multiple outputs in a single response
  • broken parsing
  • runtime crashes

That was the moment it clicked:

Randomness doesn’t just affect content.
It affects structure, control flow, and contracts.


Why schema matters (and why it still isn’t magic)

I added a strict JSON schema to enforce output shape:

  • exactly one key
  • enum-only values
  • no extra properties

That helped — but it didn’t make the system bulletproof.

At high temperature, the model still managed to:

  • emit multiple valid JSON objects
  • concatenate them into one string
  • break json.loads()

The fix wasn’t “better prompts”.
The fix was validation and assumptions.


The real abstraction

The correct abstraction is not:

LLM → answer

It is:

LLM → candidates → validation → selection → result

Only after validation does the system behave like a function.

That wrapper — not the model — is the real unit of engineering.


Decision paths vs generative paths

This lesson led to a clean rule I now follow:

  • Decision-making paths
    (classification, routing, tool selection)
    → temperature = 0
    → strict schema
    → exactly one output

  • Generative paths
    (blog posts, summaries, ideas)
    → higher temperature
    → variation is a feature

Writing a blog post should be creative.
Choosing the correct tool should not.


The takeaway

Lesson 1 wasn’t about prompt engineering.

It was about this shift:

LLMs are unreliable by default.
Reliable systems wrap them until they behave like functions.

That means: - constraints
- contracts
- validation
- retries
- and explicit assumptions

Once you accept that, AI stops feeling magical — and starts feeling like engineering.

View On Github


Lesson 2 will cover retries, voting, and how to design systems that stay reliable even when the model is uncertain.

← Back to Blog