What makes sunmoon.dev different from an agency?

You work directly with me — a SaaS founder who builds and operates his own AI products (transcribe.so, goodlisten.co). There's no account manager, no junior bench, and no hand-off. The person who scopes your build is the person who ships it.

How long does it take to build an AI application?

It depends on complexity. A basic AI agent can ship in 4–6 weeks; more involved solutions with custom features take 3–6 months. You'll get a concrete timeline on the first call before any commitment.

Do you provide ongoing support after launch?

Yes. Every build includes 6 months of email support, and I offer a discounted on-call package for extended maintenance and feature work as you scale.

What technologies do you use?

Industry-standard, mostly open tooling to avoid vendor lock-in: Next.js and Node for the app layer, Postgres/Supabase for data, and the right AI models picked per task (OpenAI, Qwen, Mistral, and others). Exact stack is chosen around your requirements.

How do you handle data privacy and security?

Encryption in transit and at rest, scoped access, and compliance-aware design. Where it matters, solutions can be deployed on your own infrastructure so you keep full control of your data.

Transparent and upfront. Consulting starts at $200/hour; AI agent and full-stack SaaS builds start at $10,000, with final cost depending on scope. You get a detailed quote after the first call.

How to Vet a Freelance AI Engineer (Questions That Actually Filter)

You posted a job for an AI engineer and got forty applications. Every one of them says "LLM expert." Every one of them has a portfolio full of chatbots and a GitHub with three forked repos. You're about to pay someone $80–$200 an hour to build something your business will depend on, and you have no reliable way to tell who can actually do it.

Since 2023, the cost of producing an impressive AI demo has dropped to roughly zero. Anyone can wire GPT to a chat window in an afternoon. The cost of operating an AI feature in production — where real users do unreasonable things, models return garbage, and the bill arrives monthly — has not dropped at all. The entire vetting problem reduces to one question: is this person a builder or a demo-maker?

I sit on both sides of this. I hire contractors for my own products, and I'm the freelance AI engineer founders try to vet. After years as a senior engineer at Spotify and Klarna, time with a Y Combinator–backed startup, and a stretch as a Supabase Expert Partner, I now run my own AI products full-time. The questions below are the ones I'd want a client to ask me — because they're the ones demo-makers can't fake.

Why portfolios don't work anymore

A portfolio shows you that something existed for the duration of a screen recording. It doesn't show you:

Whether the thing handled a malformed input without crashing
Whether it still worked the week after the model provider shipped a silent update
Whether anyone other than the builder ever used it
What it cost to run at any volume above "me, testing it"

Demo-makers optimize for the recording. Builders optimize for month three. The questions that filter are the ones about month three.

The four questions that actually filter

1. "Show me something you operate in production"

Not "built." Operate. Present tense. Something with real users, a real bill, and a real on-call story.

This single question eliminates most of the field. A demo-maker will show you a Loom of a prototype. A builder will show you a live URL, tell you the monthly active numbers (even if small), and — this is the tell — start volunteering the ugly parts unprompted: "the ASR provider rate-limits us during US business hours, so we built a queue," or "we had to add a fallback because the primary model went down twice in March."

When someone asks me this, I point at transcribe.so, my multi-model transcription product, and goodlisten.co, my AI podcast discovery and creator studio. Both are live, both have paying-traffic-shaped problems, and both have taught me things no client project ever did — because when it's your own infra bill and your own churn, you can't hand the problem back.

Follow-ups that deepen the signal:

"What broke last month?" (Builders have an immediate answer. Demo-makers say "nothing.")
"What does it cost to run?" (If they don't know their own unit economics, they won't know yours.)
"How many users?" (Small honest numbers beat vague big ones.)

2. "How do you evaluate whether the AI is actually working?"

This is the most technical question on the list and the one that most cleanly separates 2023-era prompt tinkerers from engineers.

A weak answer: "I test it with a bunch of prompts and check the outputs look good." That's vibes. Vibes don't survive a model version bump.

A strong answer mentions some concrete subset of:

A golden dataset — a fixed set of inputs with known-good outputs, run on every change
Regression evals — so a prompt tweak that improves case A can't silently break case B
Task-specific metrics — word error rate for transcription, citation accuracy for RAG, exact-match for extraction — not just "the LLM judge liked it"
Eval-before-upgrade discipline — new model versions get run against the eval suite before they touch production

For transcribe.so I keep a golden audio corpus that runs against the pipeline after any change to chunking, aggregation, or prompts. It has caught regressions that looked fine in spot checks. Anyone who has operated an AI product for more than a few months has independently invented some version of this — and anyone who hasn't will look at you blankly.

3. "What happens when the model is wrong?"

Models are wrong. Not occasionally — routinely, at some percentage, forever. The engineering is not in preventing that; it's in deciding what the product does when it happens.

You're listening for failure-mode thinking:

Detection: confidence thresholds, output validation, schema checks, citation verification
Degradation: fallback models, retries with different parameters, graceful "I don't know" states
Containment: which actions the AI can take autonomously vs. which need a human or a hard rule in front of them
Honesty in the UI: does the product signal uncertainty to the user, or does it present hallucinations with full confidence?

Demo-makers design for the model being right. The case where it's wrong is the one that emails support, leaves the one-star review, or sues you — designing for that case is the actual job.

If the candidate has never thought about what their system does on a bad output, every failure becomes your incident, discovered by your customers.

4. "Who owns the code, and can I run it without you?"

This one is about your downside. Freelance engagements end — sometimes well, sometimes not — and the asymmetry of a bad ending is brutal if you didn't ask this up front.

Concretely, get clear answers on:

IP assignment — you own the code outright on payment, in writing, no "license to use"
Repo location — it lives in your GitHub org from day one, not theirs
Accounts and keys — model provider accounts, hosting, and DNS are yours; the engineer gets invited in, never the reverse
Bus factor — is there a README and deployment doc good enough that another competent engineer could take over in a week?

A professional will agree to all of this immediately and probably bring it up before you do. Resistance on any point — especially repo ownership or API keys living in their personal account — is disqualifying, not negotiable.

Red flags, ranked

Red flag	What it usually means	Severity
Portfolio is only demos and tutorials, nothing live	Has never operated anything; you fund their learning curve	High
"Evals? I just test the prompts manually"	No regression safety; every change is a gamble	High
Wants code/keys/repos in their accounts	Lock-in by design; ugly off-boarding	Disqualifying
Can't name a single production failure they've handled	Either inexperienced or not honest — both bad	High
Quotes a fixed price instantly, no discovery questions	Doesn't understand the problem, or plans to make scope your problem	Medium
Promises a specific accuracy number before seeing your data	Selling, not engineering	Medium
Every answer name-drops frameworks, none mentions trade-offs	Tutorial-level depth	Medium
No opinion on cost per request / unit economics	Has never paid a model bill at volume	Medium

One medium flag is a conversation topic. Two highs, or any disqualifier, means keep looking — the supply of freelance AI engineers is large; the supply of good ones is small but findable.

A fair fight: how to vet me

It would be convenient to stop here. But the standard has to apply to me too. If you're considering working with me, don't take the résumé on faith — Spotify and Klarna are real, but past employers vouch for nobody's freelance work, including mine.

Instead, vet me on proof-of-work:

Question 1: transcribe.so and goodlisten.co are live. Sign up, upload a file, try to break them. What you experience is my production engineering, unfiltered by a portfolio page.
Question 2: ask me about my eval setups on a call. I'll show you the golden corpus, not describe it.
Question 3: ask me what broke last quarter. I'll tell you, specifically, including the part that was my fault.
Question 4: your repo, your org, your keys, IP assigned on payment. Standard terms, in the contract.

This generalizes: good freelancers welcome hard vetting, because it's where they beat the demo-makers undercutting them on rate. If a candidate gets defensive under these questions, that's your answer too.

Frequently Asked Questions

How long should vetting a freelance AI engineer take?

One focused hour of conversation plus an hour of checking their live work is usually enough — the four questions above produce strong signal fast. A small paid trial task (a few days, real scope, real codebase) is worth far more than a third interview. Skip unpaid "test projects"; good engineers decline them and you select for desperation.

Should I test them with a take-home coding challenge?

For AI work specifically, a generic LeetCode-style test measures almost nothing relevant. If you want a work sample, pay for a tightly scoped real task: "add an eval harness for this one prompt" or "instrument the cost per request on this endpoint." You'll see how they think about exactly the problems that matter.

What's a fair rate, and is cheaper riskier?

Experienced AI engineers who operate production systems typically charge $100–$250/hour or equivalent project rates, varying by region. Below that range you're mostly buying demo-makers, and the true cost shows up later as a rewrite. The expensive engineer who ships a maintainable system once is almost always cheaper than two cheap attempts.

What if I can't evaluate the technical answers myself?

Borrow judgment: have a technical friend or advisor sit in on one call, or hire a fractional CTO for a few hours to run the vetting. Failing that, weight question 1 and question 4 heavily — "show me what you operate" and "who owns the code" are verifiable by any founder, no ML background required.

If you'd rather start from proof-of-work than a stack of résumés, book a call and bring your hardest questions — I'll answer the same four I just gave you.

Have something that needs shipping?