Inference API

A machine-readable endpoint that accepts inputs and returns AI model outputs on demand.

Also known as: model API, AI API, LLM endpoint

An inference API is a machine-readable endpoint that accepts structured inputs and returns AI model outputs in real time. The caller sends a request (text, image, structured data), the API runs it through an AI model, and returns a prediction, completion, classification, or other output.

In the AI marketplace, inference APIs are the most machine-consumable form of AI asset. They can be queried directly by other AI systems, enabling agent-to-agent commerce at the API layer. Pricing models include per-call, per-token, per-second of compute, or subscription tiers.