DeepSeek
AI AssistantsReasoning and code AI with free web access
AISH may earn a commission · How we fund this site
AISH Bottom Line
DeepSeek's V4 family ships two API models (Flash for speed, Pro for reasoning) with 1M context and 384K max output, available free on web and mobile. Both support thinking mode by default, letting you see the chain-of-thought before the final answer. At $0.28/M tokens for Flash output, it undercuts OpenAI on comparable tasks while matching Anthropic on reasoning depth.
Pros & Cons
Pros
Thinking mode is the default, not a cost-add-on
Both V4-Flash and V4-Pro enable thinking mode by default, producing visible reasoning traces. This transparency surfaces the model's logic and helps you catch errors in multi-step tasks. Output runs up to 384K tokens, so extended reasoning chains don't get cut off. Why it matters: You see what the model is reasoning through, making outputs more trustworthy and debuggable.
1M token context at commodity pricing
V4 brings 1M context (8x previous) to both Flash ($0.14/M cache miss) and Pro ($0.435/M with discount). Repeated prefix tokens hit the cache at $0.0028 or $0.003625 per M. No long-context tier premium. Why it matters: Bulk processing, RAG, and multi-document workflows become cost-competitive against expensive long-context offerings.
Drop-in replacement for OpenAI SDKs
Change your base_url from api.openai.com to api.deepseek.com and your code continues to work. Tool calls, streaming, system prompts, JSON mode, and thinking blocks all parse the same way. Anthropic SDK format also works. Why it matters: Teams can A/B test DeepSeek without rewriting integrations.
Cons
FIM completion only works in non-thinking mode
The Fill-in-the-Middle feature (code completion) was restricted to standard mode in V4. Use V4-Flash non-thinking for inline code suggestions; switch to thinking mode or V4-Pro for reasoning tasks. Impact: Teams doing both in-IDE completion and reasoning workflows must select the right model per task rather than toggling a parameter.
Thinking output tokens still consume your quota
Reasoning traces count toward your output token limit. Long reasoning chains on V4-Pro (default thinking) can push output use 3-5x higher than a direct answer. Cached reasoning helps but doesn't eliminate the cost. Impact: Cost-sensitive apps may need to benchmark thinking vs. non-thinking trade-offs.
Pricing
Free (Web & App)
General users wanting free AI chat on web, iOS, and Android
- Full DeepSeek-V4-Flash model access in chat mode
- Web search
- Thinking mode (chain-of-thought)
- File upload with text extraction
- Cross-platform chat history sync
- No ads or in-app purchases
deepseek-v4-flash API
Developers and businesses requiring fast general-purpose inference
- 1M context length
- Max output 384K tokens
- Thinking and non-thinking modes
- JSON output
- Tool calls
- Chat prefix completion (Beta)
- FIM completion (non-thinking mode only)
- Context caching (cache hit: $0.0028/1M tokens, cache miss: $0.14/1M tokens)
- Output: $0.28/1M tokens
deepseek-v4-pro API
Developers requiring maximum reasoning performance
- 1M context length
- Max output 384K tokens
- Thinking (default) and non-thinking modes
- JSON output
- Tool calls
- Chat prefix completion (Beta)
- Context caching
- 75% launch discount through 2026-05-31
- Pay-as-you-go top-up billing
Plans and prices can change — always verify on the vendor's site.
Visit DeepSeek →AISH may earn a commission · How we fund this site
Features
DeepSeek-V4 Models
DeepSeek's V4 model family includes two API tiers — V4-Flash for fast general-purpose inference and V4-Pro for maximum reasoning performance. Both run on a Mixture-of-Experts architecture with a 1M token context window and 384K max output. Models are accessible via the free web chat at chat.deepseek.com, iOS/Android apps, and the REST API. Context caching reduces repeated prefix costs by 90%.
Thinking Mode (Chain-of-Thought Reasoning)
Before outputting a final answer, DeepSeek's thinking mode produces a chain-of-thought reasoning trace (reasoning_content) to improve accuracy on complex tasks. Thinking mode is the default on V4-Pro and available on V4-Flash. Outputs run up to 384K tokens for extended reasoning chains.
Tool Calls (Function Calling)
DeepSeek's API allows the model to call external functions and tools to enhance its capabilities. Supported in both thinking and non-thinking modes, with a strict mode (Beta) that enforces exact JSON schema compliance on tool outputs.
JSON Output (Structured Output)
DeepSeek provides a JSON Output mode that guarantees the model returns valid, parseable JSON strings. Users set the response_format parameter to json_object and include a JSON example in their prompt to guide structured output for downstream parsing.
Context Caching (KV Cache on Disk)
The DeepSeek API automatically caches request prefixes on disk for all users at no extra configuration. Repeated prefix tokens in subsequent requests trigger a cache hit, reducing cost from $0.28 to $0.0028 per 1M input tokens — a 90% discount.
FIM Completion — Fill in the Middle (Beta)
Users can supply a code or text prefix and optional suffix, and the model fills in the middle content. Commonly used for code completion and content completion tasks, integrating natively with VS Code via the Continue plugin. Non-thinking mode only (V4 restriction). In V3.2, FIM was available in both modes; V4 restricts it to non-thinking mode.
Chat Prefix Completion (Beta)
Developers can provide an assistant message prefix and force the model to continue from exactly that point, enabling precise control over output format and style — for example, prefixing with a Python code block to guarantee code-only output.
Multi-round Conversation Support
The DeepSeek chat API supports stateful multi-turn conversations by allowing developers to concatenate full conversation history per request. The API is stateless by design, giving developers full control over context management across conversation turns.
OpenAI-Compatible API Format
The DeepSeek API is fully compatible with the OpenAI SDK and format, requiring only a base_url swap to https://api.deepseek.com. Existing OpenAI integrations can switch to DeepSeek models with minimal code changes.
Anthropic API Compatibility
DeepSeek's API supports the Anthropic API format, allowing teams using the Anthropic SDK or Claude Code to route requests to DeepSeek models. Supported fields include streaming, system prompts, tool use, thinking blocks, and temperature control.
DeepSeek Web Chat & Mobile App
DeepSeek offers a free web chat at chat.deepseek.com and a mobile app on iOS and Android. Features include web search, thinking mode (chain-of-thought), file upload with text extraction, and cross-platform chat history sync — with no ads or in-app purchases.
Open-Source Model Weights
DeepSeek releases model weights for DeepSeek-V4, DeepSeek-R1, DeepSeek-Coder V2, DeepSeek-VL, and others on Hugging Face under open licenses. This enables self-hosting, fine-tuning, and research use without API dependency.
Integrations
Use Cases
Existing OpenAI SDK codebases switch to DeepSeek by changing one line (the base_url), keeping all tool-calling logic and streaming intact. V4-Flash delivers GPT-4 level output performance at Flash pricing ($0.28/M tokens output). Supports Anthropic SDK format too, so teams running both OpenAI and Anthropic models can route to DeepSeek without refactoring.
Teams use V4-Pro's thinking mode to construct autonomous agents that reason before acting. The model outputs chain-of-thought internally, then calls external tools (database queries, API hits, calculations) based on that reasoning. Thinking blocks remain cached across turns, reducing token cost by 90% when the same reasoning pattern repeats. Ideal for code review workflows, data validation pipelines, and itinerary planning agents.
Developers send 500K+ token documents (entire codebases, research papers, contract batches) to V4's 1M context window for semantic search, cross-file refactoring suggestions, or bulk summarization. Context caching keeps repeated sections cheap ($0.0028/M cached tokens). Previously hitting 128K limits mid-document is now a non-issue.
Individual developers use chat.deepseek.com to brainstorm APIs, debug code snippets, and refactor logic without paying per turn. The free web chat runs V4-Flash with thinking mode enabled, file uploads, and web search. Zero setup friction, no credit card required to start.
Engine-Analysed
Data extracted and structured by the AISH Analysis Engine, not manually curated or vendor-submitted.
Verified & Dated
Pricing, features, and availability verified against DeepSeek's public pages.
Editorially Independent
AISH may earn affiliate commissions. This never influences our analysis, scoring, or recommendations.
Alternatives
Google Gemini
Google's conversational AI assistant available as free web app and mobile app with API access. Supports research, writing, and coding tasks with multi-turn conversations.
Claude (Anthropic)
Conversational AI assistant with reasoning, available via web, app, and API. Offers multiple model sizes (Haiku, Sonnet, Opus) with multi-turn conversation and tool use. Free tier available.
ChatGPT (OpenAI)
Conversation-based AI assistant with web, app, and API access. Multiple model tiers including reasoning-focused variants. Free tier with usage limits.