sunmoon.dev
All writing

What I Learned Shipping Software at Spotify and Klarna

Seunghun Lee
engineeringSpotifyKlarnaleadership

Most engineering advice you read online was written by people who don't have to live with their own decisions. They ship a feature, write the blog post, and move teams before the bill comes due. I've spent the last decade on the other side of that — building things I then had to operate, page on, and explain when they broke at 2am.

This piece is about the habits I actually kept. Not the ones that sound good in a conference talk, but the few that survived contact with production at Spotify's Freemium org, with payment tooling at Klarna, with the chaos of a Y Combinator–backed startup, and now with two AI products I run alone.

The problem with most "senior engineer" advice

The trap is mistaking the size of the company for the source of the lesson. Big companies don't make you a better engineer by osmosis. What they give you — if you're paying attention — is a front-row seat to failure modes you'd never hit at small scale, and the chance to watch genuinely excellent people respond to them.

Here's what I look for when I'm deciding whether a habit is worth keeping: does it still make sense when I'm the only person who can fix the problem? Because that's my reality now. When something breaks in transcribe.so, there's no on-call rotation behind me. The habit either earns its place or it goes.

Spotify: optimize for the system, not the feature

I worked in Freemium — the part of Spotify responsible for converting free listeners into paying subscribers. It's a deceptively brutal domain. Every change you make ripples through experiments, billing, ad serving, and retention models simultaneously. You cannot reason about your feature in isolation, because there is no isolation.

The lesson that stuck: the unit of correctness is the system, not the commit.

At small scale you can hold the whole thing in your head, so you skip this. That's the mistake. The discipline I carried out of Spotify was treating every change as a question about second-order effects:

  • Who consumes the data I'm about to change shape on?
  • What experiment is silently depending on this code path?
  • If this deploys at 100% and is wrong, how do I know within minutes instead of days?

That last one matters more than people admit. Spotify's experimentation culture taught me that you don't ship a change, you ship a change plus the way you'll detect it was wrong. A feature without an observable signal isn't done — it's a liability you can't see.

The most expensive bugs aren't the ones that crash. They're the ones that quietly cost money for a week before anyone notices. Build the detector before you build the feature.

I now do this reflexively. Before I add a pricing experiment to goodlisten.co, I write down the metric that tells me it failed — conversion drop, refund spike, a queue backing up. If I can't name that signal, I'm not ready to ship.

Klarna: tooling is a product, and integrations are where trust lives

At Klarna I worked on Tooling Product Integrations — the connective tissue that lets internal teams and external partners plug into a payments platform without setting it on fire. Payments is unforgiving in a specific way: the cost of an error isn't a bad user experience, it's real money moving incorrectly, and a regulator who'd like a word.

Two things calcified into permanent habits there.

Internal tooling deserves product thinking

Engineers love to treat internal tools as second-class — throwaway scripts, undocumented endpoints, "the other team will figure it out." Klarna disabused me of that fast. When your integration is the thing fifty other teams build on, its API is a promise, and breaking it is breaking trust at scale.

I treat my own internal surfaces this way now. The admin tooling behind transcribe.so — the dashboards I use to debug a stuck transcription job — gets the same care as the customer-facing app, because future-me is the most important user I have.

Idempotency and reconciliation are not optional

In payments you assume every request will be retried, every network call will time out at the worst moment, and every "exactly once" is a lie you tell yourself. So you design for it: idempotency keys, reconciliation jobs, the ability to replay a day's events and arrive at the same answer.

This sounds like enterprise overhead until you run an AI product with a usage-metered billing model and a transcription pipeline that fans work out to a worker fleet. Then it's just survival. Every job in my systems carries an idempotency key. Every billing event can be reconciled against the source of truth. I learned that from moving other people's money before I had to meter my own.

The YC startup: speed is a skill, not an excuse

The startup taught me the opposite-seeming lesson, and holding both is the whole game. At a company burning runway, the cost of being slow is existential, and the cost of being wrong is usually recoverable. That's the inverse of payments, and you have to actually internalize the asymmetry rather than apply one mode everywhere.

What "fast" actually means, as a skill:

  • Ship the smallest thing that produces a real signal, then decide.
  • Delete code aggressively — every line is a liability you maintain.
  • Don't gold-plate something you might kill in two weeks.
  • Know which decisions are one-way doors and slow down only for those.

The trap is treating speed as permission to be sloppy. It isn't. Fast engineering is precise about what it skips. You make the cut deliberately and you write down what you deferred, so it's a decision and not an accident.

How these collide in a solo AI studio

Running two AI products alone forces you to blend all three modes, because there's no team to specialize. Here's roughly how I weigh them depending on what I'm touching:

Decision type Default mode Why
Billing, usage metering, payments Klarna: idempotent, reconcilable Errors move money; recovery must be guaranteed
Pricing & conversion experiments Spotify: signal before ship Effects are systemic and easy to miss
New feature, unproven demand YC startup: smallest real signal Being slow costs more than being wrong
Core transcription pipeline All three at once It's the product; it must be fast and correct

The point isn't that one mode wins. It's that the failure mode is applying the wrong mode to the wrong problem — shipping a billing change with startup speed, or gold-plating a feature nobody's validated with payments-grade rigor.

The one habit underneath all of them

If I compress everything into a single sentence: make your systems legible to your future self.

Every habit above is really about that. Observability is legibility under load. Idempotency is legibility under failure. Shipping small is legibility about whether an idea worked. When you operate your own software, you are constantly handed a mess by a past version of yourself who knew the context you've now forgotten. The engineers I respected most at every company I worked at were the ones whose systems explained themselves.

That's the bar I hold my own products to. Not "is this clever," but "will I understand this when it pages me in six months and the only documentation is the code."

Frequently Asked Questions

Do you need to work at a big company to learn these lessons?

No — but big companies give you exposure to failure modes that are rare at small scale, which accelerates the learning. The same habits are available anywhere if you operate your own software long enough to live with your decisions. Operating is the real teacher; the company name just speeds it up.

How do you apply payments-grade rigor without slowing everything down?

You don't apply it everywhere — that's the mistake. I reserve idempotency and reconciliation for anything that moves money or can't be safely retried, and let lower-stakes features ship fast. Knowing which mode a problem deserves is the actual skill, not defaulting to maximum rigor.

What's the single most underrated engineering habit?

Building the detector before the feature. A change without an observable signal that tells you it failed isn't finished — it's a silent liability. Spotify's experimentation culture drilled this into me, and it's the habit I'd transplant first into any team.

How does this shape the AI products you build now?

Directly. The billing in my products is idempotent and reconcilable because of Klarna, the pricing experiments carry failure signals because of Spotify, and the feature roadmap ships in small validated cuts because of the startup. The studio is essentially these habits applied to AI products I have to operate alone.


If you're building or scaling an AI product and want a second set of eyes from someone who's shipped at scale and now operates solo, book a call.

Have something that needs shipping?

I'm Seunghun Lee — I design, build, and ship production AI agents and full-stack SaaS. Tell me what you're building.