Costly SDK: Open-Source Tool for Auditing and Reducing LLM API Costs

Costly, an open-source SDK that wraps the Anthropic Claude API to monitor and reduce LLM spending, launched in beta this week targeting Node.js and TypeScript developers. The tool initializes via a single CLI command that scans a codebase for Anthropic SDK usage, instruments relevant files, and connects them to a hosted dashboard — all without requiring access to users' Anthropic API keys. The SDK logs only metadata (model name, token counts, estimated cost, and latency) asynchronously using fire-and-forget batching, adding zero latency to API calls.

Seven named waste detectors — prompt bloat, model overkill, duplicate queries, runaway features, error waste, output bloat, and cost trajectory — drive the tool's analysis. Each produces prescriptive recommendations with dollar amounts attached, not raw dashboards. The documentation makes the approach concrete: a 2,100-token system prompt sent identically across 14,000 monthly calls costs $89 per month, and the tool flags it with a specific fix — Anthropic's prompt caching feature — rather than just surfacing the number. Cost calculations pull directly from model and token counts in each API response, matched against Anthropic's official per-token pricing.

LangSmith, Claudetop, Helicone, and Braintrust already occupy the LLM observability space, but none map cleanly onto what Costly is doing. Helicone is the closest — open-source, cost-focused, low-overhead — but it operates as a proxy, routing API traffic through its own infrastructure. Costly's SDK wrapper sits entirely outside the request path, which sidesteps both latency risk and the security concern of sending full request and response payloads through a third-party service. The tradeoff is scope: Phase 1 supports only Anthropic's Claude SDK, with additional provider support described as forthcoming.

Costly is free during beta, no credit card required — one project, 30 days of data retention, and access to all seven detectors. The SDK is fully open source on GitHub; the dashboard and waste detection engine are managed services. The company notes that token waste carries an environmental cost too, pointing to energy and water consumption at AI data centers as a secondary reason to optimize — a framing that, given the beta launch's primary pitch, may resonate more once usage scales.