● 18 frontier models · one agentic endpoint

The agentic gateway to every frontier model

One OpenAI-compatible API key for DeepSeek-R1, Llama 3.3, Qwen Coder, Mistral, FLUX and more — with native tool-calling and streaming for agents. Usage-based pricing, instant keys, built-in analytics. Zero infrastructure.

Get your API key — free Read the docs

No credit card required · $1 of free usage every month

chat.completions

curl https://api.speka.online/v1/chat/completions \
  -H "Authorization: Bearer sk-speka-live-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-ai/deepseek-r1",
    "messages": [{"role":"user","content":"Explain quantum entanglement simply."}]
  }'

Already using OpenAI? Just change the base_url and key.

Frontier models from the labs you already trust

DSDeepSeekNVNVIDIAMMistral AIQQwenGGoogleMMetaΦMicrosoftBFBlack Forest Labs

18+

Models available

99.9%

Uptime target

<1s

Time to first token*

100%

OpenAI-compatible

Why Speka

Built for agents, ready for production

Skip the infrastructure, the rate-limit headaches and the per-vendor SDKs. One gateway, one bill, full observability — so you can ship agents, not plumbing.

Every model, one key

Reasoning, chat, code, vision, embeddings and image generation — all behind a single endpoint and key.

Agent-native API

Native tool-calling, streaming and JSON mode. Point the OpenAI SDK at our base URL — no rewrites, no lock-in.

Predictable rate limits

Per-plan RPM and concurrency, enforced fairly with clear headers so you never get surprised.

Usage-based pricing

Pay per token. Monthly included usage on every plan plus pay-as-you-go overage. No lock-in.

Built-in analytics

Track spend, tokens and latency by key, model and day — right in your dashboard.

Secure by design

Keys are hashed at rest, scoped per project, and revocable instantly. Row-level isolated data.

Model catalog

Popular models, ready to call

Frontier open models, production-ready and priced transparently.

Browse all 18

DeepSeek-R1

DeepSeek

reasoning

State-of-the-art open reasoning model with transparent chain-of-thought. Excels at math, logic and multi-step problem solving.

deepseek-ai/deepseek-r1

Llama 3.1 Nemotron 70B

NVIDIA

reasoning

NVIDIA's reward-tuned Llama variant optimized for helpfulness and instruction following. A strong general-purpose workhorse.

nvidia/llama-3.1-nemotron-70b-instruct

Llama 3.3 70B Instruct

Qwen2.5 Coder 32B

Qwen

code

Best-in-class open code model. Excellent at generation, completion, refactoring and bug fixing.

qwen/qwen2.5-coder-32b-instruct

Llama 3.2 90B Vision

FLUX.1 [dev]

Black Forest Labs

image

High-fidelity text-to-image generation with excellent prompt adherence and typography.

black-forest-labs/flux.1-dev

Per image

$0.04

Open model

How it works

Live in three steps

Create an account

Point your SDK

Set the base URL to our gateway and use any model id. Works with the OpenAI SDK, LangChain and Vercel AI SDK.

Ship & scale

Monitor usage, set limits and upgrade as you grow. We handle routing, failover and infrastructure.

Trust & security

Secure by default

Your keys and data are protected at every layer — so you can build with confidence.

Hashed API keys

Keys are hashed at rest and shown only once. Scope them per project and revoke instantly.

Encrypted in transit

All traffic is TLS-encrypted end to end between your app, our gateway and model providers.

Row-level isolation

Your usage, keys and metadata are isolated per account at the database level.

Prompts never trained on

We pass prompts through to generate responses — never to train models. Delete your data anytime.

From the field

Teams ship faster on Speka

“Swapped our OpenAI base URL and had DeepSeek-R1 in prod the same afternoon. The tool-calling just worked.”

A. Nakamura

Staff Engineer

“One bill for every model we test means we stopped juggling six vendor dashboards. Spend is finally legible.”

M. Okafor

Founder, AI startup

“Failover across capacity has kept our agents up through provider outages. That reliability earned our trust.”

J. Petrov

Platform Lead

Pricing

Start free. Scale when you do.

Every plan includes monthly usage. Only pay more when you outgrow it.

Free

$0.00 /mo

Kick the tires. No card required.

Start free

$1 of model usage / month
10 requests / minute
1 API key
Access to all open models

Starter

$19.00 /mo

For indie hackers and side projects.

Get Starter

$25 of model usage included
60 requests / minute
5 API keys
All models incl. vision & image

Pro

$99.00 /mo

For production apps and growing teams.

Get Pro

$150 of model usage included
300 requests / minute
25 API keys
Priority routing & lower latency

Scale

$399.00 /mo

For high-volume, latency-sensitive workloads.

Get Scale

$750 of model usage included
1,200 requests / minute
200 API keys
Highest-priority routing

See the full breakdown on the pricing page.

FAQ

Questions, answered

Is it really OpenAI-compatible?

Yes. We implement the OpenAI Chat Completions, Embeddings and Images endpoints. Point any OpenAI SDK or client at our base URL, set your Speka key, and use any model id from the catalog — no other code changes.

Where do the models run?

Models run on our managed inference infrastructure across multiple capacity providers. We route each request to healthy capacity and fail over automatically, so you get one stable endpoint instead of managing several vendors.

How does billing work?

Every plan includes a monthly usage allowance measured in real model spend. You're billed per token at transparent published rates. Exceed your allowance and you simply pay standard per-token rates — no overage penalties.

Can I set spending and rate limits?

Yes. Set hard spend caps and per-key rate limits from your dashboard, and get alerts as you approach them. Keys can be scoped per project and revoked instantly.

Do you store my prompts?

We transmit prompt content to model providers solely to generate responses. We store request metadata (model, token counts, latency) for analytics and billing, but we do not use your prompts to train models.

What happens on the free plan?

You get $1 of model usage every month, 1 API key, 10 requests per minute and access to all open models — enough to prototype and test. Upgrade only when you need more.

Build with frontier AI today

Join developers shipping faster with Speka. Your first key is free and takes 30 seconds.

Create free account Compare plans