Files
recipe/.planning/research/PITFALLS.md
2026-04-28 21:41:52 +02:00

20 KiB
Raw Blame History

Pitfalls Research

Domain: Kotlin Multiplatform + Compose Multiplatform (iOS-primary), Ktor/Exposed/Postgres, OIDC, LWW delta sync Researched: 2026-04-23 Confidence: HIGH for KMP/Ktor/Exposed gotchas; MEDIUM for Haze + Navigation-CMP specifics (behavior shifts across minor versions)

Critical Pitfalls

Pitfall 1: Kotlin/Native iOS GC thrashing and objcDisposeOnMain hangs

What goes wrong: On-device (especially iPhone XR/11) the app consumes 300700 MB steadily and freezes for 12 s under ViewModel churn. Flamegraphs show GC threads at >100% CPU.

Why: The K/N memory manager dispatches Obj-C release to the main thread by default, serializing teardown behind UI frames. Compose/Koin graphs produce many bridged Obj-C references per navigation.

Warning signs: Frame hitches on tab switches; main-thread time in objc_release / Kotlin_ObjCExport_releaseReservedObjectTail; Instruments shows growing K/N heap.

How to avoid: Set kotlin.native.binary.objcDisposeOnMain=false and kotlin.native.binary.gc=cms in gradle.properties from day 1. Release Kotlin refs in onDispose; don't hold them in long-lived Swift closures.

Phase: UI chrome.


Pitfall 2: Legacy freeze() / strict-mm ceremony in copy-pasted snippets

What goes wrong: Code from 20212022 tutorials adds freeze(), @SharedImmutable, AtomicReference from kotlin.native.concurrent, or ensureNeverFrozen(). Compiles on Kotlin 2.x but adds dead code and masks real bugs.

Why: The new memory manager removed the freeze paradigm entirely; freeze() is a no-op and deprecated.

Warning signs: Any of the above symbols appearing in snippets you're about to paste.

How to avoid: Reject pre-1.7.20 KMP code. Use kotlinx.atomicfu if you truly need atomics; StateFlow is already thread-safe.

Phase: Data.


Pitfall 3: ComposeUIViewController state loss on iOS re-entry

What goes wrong: Backgrounding then returning resets scroll positions, selected tabs, half-filled forms. Koin-scoped ViewModels re-create.

Why: If the UIViewController is instantiated inside a SwiftUI body, each re-render builds a fresh composition. Compose state is owned by the controller's composition root.

Warning signs: State survives Android rotation but dies on iOS foreground-return; ViewModel init fires on backgrounded return.

How to avoid: Build the UIViewController once — store in @StateObject or a top-level property, not in a SwiftUI body. Use rememberSaveable for any UI state that must survive process death. Never nest multiple ComposeUIViewController wrappers.

Phase: UI chrome.


Pitfall 4: SQLDelight iOS — missing migration files, in-memory vs file driver divergence

What goes wrong: JVM tests pass with in-memory driver; the iOS app crashes on launch with no such column after a schema change.

Why: NativeSqliteDriver persists a real file. Editing .sq without a numbered .sqm migration and a bumped schema version means SQLDelight only verifies the schema on open — on a device with an existing install, that check fails.

Warning signs: Works on fresh simulator install; breaks on physical device with prior install; Android OK, iOS fails.

How to avoid: Every schema change gets a numbered Nm.sqm. Enable verifyMigrations = true and verifyDefinitions = true. Add a dev-only "wipe DB" debug button during early development. Reinstall on device before any QA.

Phase: Data.


Pitfall 5: Exposed transaction {} inside suspend functions → pool exhaustion

What goes wrong: Plain transaction { ... } in Ktor handlers. Under modest concurrency (~20 requests) the pool exhausts, p99 cliffs, and IllegalStateException: Transaction is not currently active appears.

Why: transaction {} is blocking and binds the transaction to the calling thread. In a coroutine it blocks event-loop threads; if the code suspends mid-transaction, resume lands on a different thread and loses the JDBC connection binding.

Warning signs: Connection pool always fully leased at low RPS; latency cliffs; "transaction not active" in logs.

How to avoid: Use newSuspendedTransaction(Dispatchers.IO) { ... } in suspend contexts. Pass the Database instance explicitly. No HTTP calls inside transactions. HikariCP pool size 810 is plenty for 510 users.

Phase: Data.


Pitfall 6: Exposed DAO + JSONB footguns

What goes wrong: IntEntity + jsonb<T>() produces double-serialized JSON in Postgres ("{\"key\":\"v\"}") or SerializationException on read.

Why: DAO integration with JSONB is thin; it's easy to store a pre-stringified value. DAO lazy-loads hide when the column is read, so failures manifest far from the cause.

Warning signs: Escaped JSON in psql output; serialization errors deep in read paths.

How to avoid: Use DSL only (already locked in PROJECT.md). For JSONB, define jsonb("extras", Json.Default, MealExtras.serializer()) once; never stringify upstream. Round-trip integration test per JSONB column.

Phase: Data.


Pitfall 7: Ktor JWT — audience, issuer, clock skew, JWKS cache

What goes wrong: 401s in production only, after a while, or after Authentik restart. Messages: "Token can't be used before...", "Claim 'aud' doesn't contain required audience", or silent 401s post key-rotation.

Why: Four defaults converge:

  1. ktor-server-auth-jwt requires explicit .withAudience() / .withIssuer().
  2. Default clock leeway is zero — 2 s device drift rejects fresh tokens.
  3. JWKS cache defaults to (10, 24h) — key rotation invisible for hours.
  4. Authentik's aud can be array or string depending on provider config.

Warning signs: 401 only in prod; 401 only on some devices; works briefly then fails; 401 after Authentik restart.

How to avoid: Configure .withIssuer(issuer).withAudience(clientId).acceptLeeway(30). JWKS provider with .cached(10, 15, MINUTES).rateLimited(10, 1, MINUTES). In Authentik, emit aud as a single client_id string. Integration test: wrong aud → 401.

Phase: Auth.


Pitfall 8: OIDC redirect URI mismatch + missing PKCE

What goes wrong: "redirect_uri does not match" or consent loop on one platform; or login succeeds without PKCE and is interceptable.

Why: Native apps are public clients — no shippable secret, so Authentik requires PKCE. Redirect URIs must match byte-for-byte (trailing slash, case). iOS uses a custom URL scheme or Universal Link; Android uses an intent-filter. Debug and release builds can differ.

Warning signs: Works on Android, fails on iOS (or vice versa); Authentik logs show invalid_grant; no code_challenge in auth request; fails on release build only.

How to avoid: Authentik provider = "Public" + PKCE S256. Register both recipe://callback and recipe://callback/. AppAuth on both platforms — Kotlin actual on Android, Swift AuthBridge (over AppAuth-iOS via SwiftPM) called from iosMain on iOS — with usePKCE = true. Keep the redirect URI in one constant in shared/commonMain.

iOS bridge gotcha: the Swift AuthBridge instance must be set on IosAuthBridgeRegistry.shared.instance from iOSApp.init before KoinIosKt.doInitKoin() runs — otherwise Koin's single<IosAuthBridge> fails on first auth call. Do not try to instantiate AppAuth from pure Kotlin: there is no cocoapods.AppAuth.* available since 2026-04-28.

Phase: Auth.


Pitfall 9: LWW trusting client clocks

What goes wrong: User A's phone clock is 90 s fast; A's edit beats B's real-time-later edit in LWW. B's change silently disappears.

Why: Client-assigned timestamps trust unverifiable clocks. Even NTP-synced devices drift; simulators can be minutes off.

Warning signs: "My edit vanished"; stable prior state reappears; most common with both household members editing the same meal.

How to avoid: Server assigns updated_at on every write (already in PROJECT.md — enforce it). Client sends only content + prior updated_at for optimistic concurrency. Server sets updated_at = now() in the transaction and returns it. Make timestamps strictly monotonic per row (e.g. GREATEST(now(), old.updated_at + interval '1 microsecond')) to avoid tie collisions.

Phase: Sync.


Pitfall 10: Soft-delete + recreate race

What goes wrong: Delete a meal entry, immediately re-add "the same" one. Depending on pull ordering, the new row is hidden by the tombstone, or the old row is resurrected with old fields.

Why: If (plan_date, slot) is treated as identity, tombstone/recreate races are inevitable on concurrent 2-user editing.

Warning signs: Undeleted items; deleted meals reappear on partner's device; duplicates in pantry.

How to avoid: Identity is always a fresh UUID per row, never (date, slot). Tombstones carry their own updated_at. Pull returns tombstones and live rows; client applies in updated_at order. Per-client push outbox replays in local sequence order — never parallel. Integration test: two clients alternating delete/recreate, assert convergence.

Phase: Sync.


Pitfall 11: Pull-cursor edge cases — missed updates, same-timestamp ties

What goes wrong: Partner edits at 14:00:05; client's last pull cursor is 14:00:04.999. If cursor semantics or timestamp precision are wrong, the change is skipped forever.

Why: Cursor semantics are subtle. Second-precision timestamps, >= instead of >, and ties among rows sharing a updated_at all cause skipped or replayed rows. Debounced push interleaved with pull can reorder writes.

Warning signs: Sporadic stale data that vanishes after pull-to-refresh; only reproduces near DB restarts or bulk imports; duplicates after manual refresh.

How to avoid: updated_at is timestamptz with microsecond precision and strictly monotonic. Cursor is (updated_at, id) lexicographic: WHERE (updated_at, id) > (:since_ts, :since_id) ORDER BY updated_at, id LIMIT N. Pause pull while a push is in flight. Never split the write and its timestamp notification across transactions.

Phase: Sync.


Pitfall 12: Haze on scroll + nested children tank older iPhones

What goes wrong: LazyColumn scrolling under a blurred top bar stutters badly on iPhone XR/11, dropping to ~30 fps. Nesting hazeChild inside a list item sitting in a hazeSource Scaffold makes it worse.

Why: iOS Haze uses Skiko GraphicsLayer for offscreen capture + re-blur each frame. Progressive blur adds ~25% cost. Older A-series chips without hardware-accelerated RenderEffect equivalents jank under this load.

Warning signs: Smooth on simulator/M-series, choppy on iPhone 11; FPS 4050; Skiko render thread pegged in Instruments.

How to avoid: One hazeSource per screen, never nested. Limit blur to chrome (tab bar, nav bar, sheet headers), not scrolling content. Avoid progressive blur on iOS pre-iPhone 13. Test on the oldest target device in real hardware. Feature-flag the effect with a solid-translucent fallback.

Phase: UI chrome.


Pitfall 13: Navigation-CMP tabs — when-switch kills per-tab back stack

What goes wrong: Tabs implemented as when (tab) { 0 -> RecipesScreen()... }. Tapping into a detail, switching tabs, and returning loses the detail. System back exits the app instead of unwinding the tab.

Why: A when switch destroys the non-current tab's Compose tree. Jetpack Navigation's multi-back-stack requires either each tab as a destination in a parent NavHost, or per-tab nested NavHost instances, with popUpTo(saveState) + restoreState + launchSingleTop.

Warning signs: Deep-links don't restore; back from a nested screen jumps tabs; ViewModels re-created on tab switches.

How to avoid: One top-level NavHost; navigation(route = "recipesGraph", ...) block per tab. Bottom bar navigates: popUpTo(graph.findStartDestination().id) { saveState = true }; launchSingleTop = true; restoreState = true. Scope koinViewModel() to the destination's NavBackStackEntry, not the parent graph. Wasm deep-links are deferred per PROJECT.md.

Phase: UI chrome.


Pitfall 14: Polish locale — plurals and timestamp zones

What goes wrong: "added 2 godzina temu" (wrong plural form). Shopping items near midnight show on the wrong day across devices.

Why: Polish has four CLDR plural forms (one / few / many / other). Naive if (n == 1) handles at most two. Serializing LocalDateTime over the wire (instead of UTC Instant) produces zone/DST bugs.

Warning signs: Grammatically wrong Polish copy; yesterday's items shown as today's.

How to avoid: Use Compose Resources <plurals> with all four forms; call pluralStringResource(count). Wire format: Instant UTC ISO-8601 only; display: .toLocalDateTime(TimeZone.currentSystemDefault()). Unit test plurals with count 0/1/2/5/22.

Phase: UI chrome (i18n foundation).


Technical Debt Patterns

Shortcut Immediate Benefit Long-term Cost When Acceptable
Ad-hoc psql DDL, skipping Flyway Fast schema iteration Dev/prod drift; can't rebuild from scratch Pre-first-deploy only; squash into V1__init.sql before real data
Hardcoded OIDC issuer/client_id in shared/commonMain Avoids build-config plumbing Can't run against staging Authentik; Authentik change forces rebuild v1 single-environment only
Plain transaction {} in admin endpoints Simpler mental model Mixing blocking + suspend patterns leaks; eventually every endpoint wants suspend Admin-only, single-user endpoints
Free-form meal_entry.extras JSONB without schema Evolve without migrations No DB validation; orphan fields accumulate; hard to query Until extras shape stabilizes; then promote hot fields to columns
No indices until queries are slow Faster early dev p99 cliffs during sync; adding indices under load is risky Until first data import; then index every (household_id, updated_at)

Integration Gotchas

Integration Common Mistake Correct Approach
Authentik OIDC Confidential client type with secret shipped in binary Public client + PKCE S256; never ship client_secret
Authentik OIDC Leaving default signing alg; Ktor JWT expects RS256 Configure RS256 explicitly; verify kid resolves via JWKS
Haze + Scaffold hazeSource on Scaffold root + hazeChild on a sheet both capturing hazeSource on scrollable content only; chrome uses hazeChild
App Store / TestFlight ATS exception to reach homelab self-signed cert Real cert via Let's Encrypt + Caddy/Traefik; never ship ATS exceptions
Postgres JSONB WHERE extras->>'k' = 'v' with no GIN index CREATE INDEX ... USING GIN (extras jsonb_path_ops) once access patterns emerge

Performance Traps

Trap Symptoms Prevention When It Breaks
Pull sync without pagination First-sync-after-seed hangs seconds Cursor-paginate LIMIT 200 ORDER BY updated_at, id >500 rows in any scoped table
Coil full-res images in recipe grid Memory spikes, laggy scroll Explicit thumbnail Size; memory+disk cache >30 images on screen
Compose recomposition of entire calendar per edit Calendar flashes on slot change; scroll resets Stable IDs per slot; hoist per-slot state; derivedStateOf for totals Any calendar with >7 days visible
Haze over full scrolling region Jank on iPhone XR/11 Blur chrome only, not content; fallback for old devices Pre-A13 silicon on 60 Hz panels

Security Mistakes

Mistake Risk Prevention
Missing WHERE household_id = :caller_household on reads Cross-household data leak All scoped reads go through a HouseholdScope helper; review rule: no raw selectAll() on scoped tables
Trusting client-supplied household_id in request body Tenancy bypass via crafted POST Derive household_id from JWT submemberships; ignore body's value
Logging the Authorization header in Ktor CallLogging Tokens leak to log files → account compromise Custom log filter redacting Authorization; never log.info(token)
Storing OIDC refresh token in plain prefs Local/backup exposure multiplatform-settings with Keychain (iOS) / EncryptedSharedPreferences (Android) backends

"Looks Done But Isn't" Checklist

  • Auth: Login works — verify token refresh runs before expiry (set Authentik access-token lifetime to 5 min in dev; watch for silent 401s)
  • Sync: Pull works — verify tombstones propagate (delete on A, confirm gone on B after pull, not just after push)
  • Sync: Offline writes survive app kill + relaunch + reconnect — not just a warm resume
  • Household isolation: Log in as household B; hit every endpoint; assert zero household A rows returned
  • SQLDelight migrations: Install prior release, launch once, upgrade in place; confirm no crash, no data loss
  • Polish plurals: Open every screen with counts 0, 1, 2, 5, 22; verify grammar
  • Haze performance: Test on oldest supported device (iPhone XS/11) scrolling a full screen; not just simulator

Pitfall-to-Phase Mapping

Pitfall Prevention Phase Verification
K/N GC thrash; objcDisposeOnMain UI chrome (infra) Gradle property set; Instruments shows no GC-main domination
Legacy freeze() ceremony Data Code search for freeze(, @SharedImmutable returns empty
UIViewController re-creation UI chrome State survives background/foreground cycle
SQLDelight missing migration Data Prior-build → new-build upgrade test on real device
Blocking Exposed transaction in suspend Data No transaction { in suspend paths; 50-concurrent-request load test with pool size 10
DAO + JSONB Data No exposed.dao.* imports; per-JSONB-column round-trip test
JWT aud/iss/leeway/JWKS Auth Wrong-aud → 401; 30 s skew → 200; JWKS refreshes within 15 min
OIDC redirect URI / PKCE Auth Flow passes on iOS and Android; Authentik logs show code_challenge per request
LWW client-clock trust Sync All writes set updated_at server-side; clients never send it
Soft-delete recreate race Sync Two-client alternating delete/recreate converges
Pull-cursor edge cases Sync Cursor is (updated_at, id) lexicographic; same-timestamp test
Haze scroll jank UI chrome iPhone 11 real-device FPS >55 on recipe grid scroll
Nested NavHost / multi-back-stack UI chrome Tab switch preserves deep state; system back unwinds within tab
Polish plurals / timestamps UI chrome Plural unit tests pass; wire format is UTC-only
Household tenancy bypass Auth + Sync Cross-household read test asserts empty result sets

Sources


Pitfalls research for: Kotlin Multiplatform recipe/meal-planning app with self-hosted Ktor + Postgres + Authentik backend Researched: 2026-04-23