What makes sunmoon.dev different from an agency?

You work directly with me — a SaaS founder who builds and operates his own AI products (transcribe.so, goodlisten.co). There's no account manager, no junior bench, and no hand-off. The person who scopes your build is the person who ships it.

How long does it take to build an AI application?

It depends on complexity. A basic AI agent can ship in 4–6 weeks; more involved solutions with custom features take 3–6 months. You'll get a concrete timeline on the first call before any commitment.

Do you provide ongoing support after launch?

Yes. Every build includes 6 months of email support, and I offer a discounted on-call package for extended maintenance and feature work as you scale.

What technologies do you use?

Industry-standard, mostly open tooling to avoid vendor lock-in: Next.js and Node for the app layer, Postgres/Supabase for data, and the right AI models picked per task (OpenAI, Qwen, Mistral, and others). Exact stack is chosen around your requirements.

How do you handle data privacy and security?

Encryption in transit and at rest, scoped access, and compliance-aware design. Where it matters, solutions can be deployed on your own infrastructure so you keep full control of your data.

Transparent and upfront. Consulting starts at $200/hour; AI agent and full-stack SaaS builds start at $10,000, with final cost depending on scope. You get a detailed quote after the first call.

Why Your RAG App Needs Cited Answers

You shipped a RAG feature. It demos beautifully. Then a user asks a question, gets a confident paragraph back, and quietly wonders: is any of this actually true? They have no way to check, and neither do you. That gap — between an answer and the evidence for it — is why RAG features stall out after the demo.

I've built retrieval-augmented Q&A into my own products, and the lesson keeps repeating: the answer is not the product. The cited answer is the product. Here's why, and how I implement it.

The problem with a naked answer

A RAG system retrieves chunks, stuffs them into a prompt, and asks a model to synthesize. The output reads like authority. But strip away the citations and you've built a very expensive way to launder hallucinations into prose your users will trust by default.

The failure modes are predictable:

The model invents a fact that wasn't in any retrieved chunk, and nobody catches it because there's nothing to check against.
Retrieval pulled the wrong passage, the model answered faithfully from bad context, and the answer is wrong for reasons no one can see.
The answer is right but the user doesn't believe it — and an answer your user won't act on has zero value.

Every one of these dissolves the moment you attach a source the user can open and read.

What citations actually buy you

Source attribution changes the economics of the whole system, and most of the payoff lands in places you wouldn't add a "polish" feature for.

Trust

People don't trust black boxes with decisions that matter. When I worked at Klarna, every number that touched a customer's credit decision had to trace back to a record — not because someone liked footnotes, but because an unverifiable claim about someone's money is a non-starter. RAG answers are the same. A citation turns "trust me" into "check for yourself," and that shift is what gets a feature out of the toy bucket.

Hallucination reduction

This one surprises people: requiring citations makes the model hallucinate less, not just visibly. When the generation step is constrained to ground every claim in a retrieved span, the model has less room to free-associate. You're not only catching fabrications after the fact — you're structurally discouraging them. A model asked to cite is a model on a shorter leash.

Debugging

When an uncited answer is wrong, you're guessing. Was it retrieval? Ranking? The prompt? The model? With citations, the answer tells you where it came from, so you can immediately see whether the right chunk was retrieved and the model simply misread it, or the wrong chunk was retrieved and the model did its job faithfully. That single signal cuts debugging time more than any eval dashboard I've built.

Compliance

In regulated or high-stakes contexts, "the AI said so" is not a defensible answer. An auditor, a lawyer, or a careful customer wants the provenance. Citations double as an audit trail — every answer carries a pointer back to the source document and location.

How transcribe.so implements cited Q&A

I'll get concrete, because the architecture matters more than the principle. When I built the Q&A layer in transcribe.so, the core constraint was that every answer about a transcript must point back to the exact moment in the audio where the claim is supported — not the document or the paragraph, the timestamp.

Here's the pipeline:

Chunk with provenance baked in. When transcribe.so processes audio, every transcript segment carries its start and end timestamp from the moment it's created. You can't cite what you didn't track, so the pointer has to exist from ingestion onward — not get bolted on later.
Retrieve with the citation attached. When a user asks a question, retrieval returns chunks that still carry their timestamps. Nothing gets flattened into anonymous text.
Generate with grounding required. The generation prompt instructs the model to answer only from the retrieved spans and to tag each claim with the segment it came from. If a claim can't be grounded, the system says so rather than papering over the gap.
Render answers that jump to the source. This is the part users feel. Each cited answer in transcribe.so links to the exact timestamp — click it and the player jumps straight to the moment in the audio. The user verifies in one click instead of scrubbing through an hour of recording.

That last step is the whole game. A citation that drops you at 14:32 of a two-hour interview is a verification tool; one you can't act on is decoration.

Cited vs. uncited RAG, side by side

Dimension	Uncited answer	Cited answer (jump-to-source)
User trust	"I hope this is right"	"I can check in one click"
Wrong-answer cost	Silent, propagates	Caught at the source
Debugging signal	Guesswork	Retrieval vs. generation isolated
Compliance posture	Indefensible	Audit trail built in
Build cost	Lower upfront	Modestly higher, pays back fast

The right column costs more engineering up front; it starts paying that back the first time a user disputes an answer and you can settle it with a click.

The part most teams get wrong

The mistake I see most often: treating citations as a UI feature you add at the end. By then it's too late. If your chunks don't carry provenance from ingestion, you can't retrofit it — you'd be guessing where each sentence came from, which is exactly the problem you were trying to solve.

Provenance is an architectural decision, not a display option. At a YC-backed startup I worked with, the features that were cheap to bolt on late were the ones the data model had been designed for on day one. Citations are that kind of feature. Bake the source pointer into the chunk at creation time, carry it through retrieval untouched, and let the model reference it. The UI is the easy last mile.

I apply the same discipline in goodlisten.co: when content is generated or recommended from underlying sources, the chain back to those sources stays intact rather than getting flattened into an opaque blob. Building data-heavy systems at Spotify taught me that the metadata you throw away early is the metadata you end up wanting back — and provenance tops that list.

A practical checklist

If you're adding cited Q&A to a RAG app, here's what I'd verify before shipping:

Every chunk carries a stable, specific source pointer (document ID and location — page, timestamp, offset) from ingestion.
Retrieval preserves that pointer end to end; nothing strips it.
The generation prompt requires grounding and has a defined behavior for "I can't support this claim."
The citation is clickable and lands the user at the exact location, not the top of the document.
You log which chunks were retrieved per answer, so debugging is reading logs, not re-running guesses.

Hit those five and you've moved from a RAG demo to a RAG product.

Frequently Asked Questions

Don't citations slow down or complicate the build?

Modestly, yes — you have to carry provenance through the whole pipeline instead of just generating text. But the cost is front-loaded and small relative to what it saves. The first time a citation lets you isolate a retrieval bug in minutes instead of an afternoon, it's paid for itself.

Will requiring citations make answers worse or more cautious?

In my experience it makes them better. Grounding the model in retrieved spans reduces fabrication and tightens relevance. Answers do get more honest about what they can't support, and that's the point — a confident wrong answer is the worst output a Q&A system can produce.

Can I add citations to an existing RAG app that wasn't built for them?

Partially, and it depends on whether your chunks retained source metadata. If they did, you can surface it. If your ingestion discarded provenance, you'll likely need to re-index with source pointers as first-class fields. That re-index is the real cost of deferring the decision.

How granular should a citation be?

As granular as the user needs to verify in one action. A document-level link is barely better than none. In transcribe.so I cite to the exact timestamp so one click confirms the claim. Aim for the smallest unit a user can check without hunting.

If you're building a RAG feature and want it to survive contact with real users — not just demo well — book a call and I'll help you design citations in from the start.

Have something that needs shipping?