supermemory Infinite Chat

supermemory Infinite Chat is a powerful solution that gives your chat applications unlimited contextual memory. It works as a transparent proxy in front of your existing LLM provider, intelligently managing long conversations without requiring any changes to your application logic.

Unlimited Context

No more token limits - conversations can extend indefinitely

Zero Latency

Transparent proxying with negligible overhead

Cost Efficient

Save up to 70% on token costs for long conversations

Provider Agnostic

Works with any OpenAI-compatible endpoint

Getting Started

To use the Infinite Chat endpoint, you need to:

1. Get a supermemory API key

Head to supermemory’s Developer Platform built to help you monitor and manage every aspect of the API.

Getting an API Key

Done! You can now use your API key to authenticate requests to the supermemory API.

Next up, let’s add your first memory.

2. Add supermemory in front of any OpenAI-Compatible API URL

import OpenAI from "openai";

/**
 * Initialize the OpenAI client with supermemory proxy
 * @param {string} OPENAI_API_KEY - Your OpenAI API key
 * @param {string} SUPERMEMORY_API_KEY - Your supermemory API key
 * @returns {OpenAI} - Configured OpenAI client
 */
const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  baseURL: "https://api.supermemory.ai/v3/https://api.openai.com/v1",
  headers: {
    "x-api-key": process.env.SUPERMEMORY_API_KEY,
  },
});

How It Works

Transparent Proxying

All requests pass through supermemory to your chosen LLM provider with zero latency overhead.

Intelligent Chunking

Long conversations are automatically broken down into optimized segments using our proprietary chunking algorithm that preserves semantic coherence.

Smart Retrieval

When conversations exceed token limits (20k+), supermemory intelligently retrieves the most relevant context from previous messages.

Automatic Token Management

The system intelligently balances token usage, ensuring optimal performance while minimizing costs.

Performance Benefits

Reduced Token Usage

Unlimited Context

Improved Response Quality

Zero Performance Penalty

Pricing

Free Tier

100k tokens stored at no cost

Standard Plan

$20/month fixed cost after exceeding free tier

Usage-Based

Each thread includes 20k free tokens, then $1 per million tokens thereafter

Free Tier

100k tokens stored at no cost

Standard Plan

$20/month fixed cost after exceeding free tier

Usage-Based

Each thread includes 20k free tokens, then $1 per million tokens thereafter

Feature	Free	Standard
Tokens Stored	100k	Unlimited
Conversations	10	Unlimited

Error Handling

supermemory is designed with reliability as the top priority. If any issues occur within the supermemory processing pipeline, the system will automatically fall back to direct forwarding of your request to the LLM provider, ensuring zero downtime for your applications.

Each response includes diagnostic headers that provide information about the processing:

Header	Description
`x-supermemory-conversation-id`	Unique identifier for the conversation thread
`x-supermemory-context-modified`	Indicates whether supermemory modified the context (“true” or “false”)
`x-supermemory-tokens-processed`	Number of tokens processed in this request
`x-supermemory-chunks-created`	Number of new chunks created from this conversation
`x-supermemory-chunks-deleted`	Number of chunks removed (if any)
`x-supermemory-docs-deleted`	Number of documents removed (if any)

If an error occurs, an additional header x-supermemory-error will be included with details about what went wrong. Your request will still be processed by the underlying LLM provider even if supermemory encounters an error.

Rate Limiting

Currently, there are no rate limits specific to supermemory. Your requests are subject only to the rate limits of your underlying LLM provider.

Supported Models

supermemory works with any OpenAI-compatible API, including:

OpenAI

GPT-3.5, GPT-4, GPT-4o

Anthropic

Claude 3 models

Other Providers

Any provider with an OpenAI-compatible endpoint

Memory API

Model Enhancer

Overview

Unlimited Context

Zero Latency

Cost Efficient

Provider Agnostic

Unlimited Context

Zero Latency

Cost Efficient

Provider Agnostic

Getting Started

1. Get a supermemory API key

2. Add supermemory in front of any OpenAI-Compatible API URL

How It Works

Performance Benefits

Pricing

Free Tier

Standard Plan

Usage-Based

Free Tier

Standard Plan

Usage-Based

Error Handling

Rate Limiting

Supported Models

OpenAI

Anthropic

Other Providers

Memory API

Model Enhancer

Overview

Unlimited Context

Zero Latency

Cost Efficient

Provider Agnostic

Unlimited Context

Zero Latency

Cost Efficient

Provider Agnostic

​Getting Started

​1. Get a supermemory API key

​2. Add supermemory in front of any OpenAI-Compatible API URL

​How It Works

​Performance Benefits

​Pricing

Free Tier

Standard Plan

Usage-Based

Free Tier

Standard Plan

Usage-Based

​Error Handling

​Rate Limiting

​Supported Models

OpenAI

Anthropic

Other Providers

Getting Started

1. Get a supermemory API key

2. Add supermemory in front of any OpenAI-Compatible API URL

How It Works

Performance Benefits

Pricing

Error Handling

Rate Limiting

Supported Models