# Overview

# Overview

Welcome to the Fluxions API. Our hosted endpoints cover three product surfaces:

- **Transcription** — `akro-v1`, our listening model: speech-to-text, speaker diarization, and non-speech events (breaths, laughter, hesitations) in one call. Production-ready today.
- **Text-to-Speech** — hosted **VUI** for expressive, low-latency TTS over HTTP or WebSocket. Live today — see [Speech](#speech).
- **Realtime Voice** — OpenAI Realtime-compatible WebSocket for end-to-end streaming voice conversations. *Coming soon.*

This page covers the basics that apply across all surfaces: authentication, base URL, and a health check.

## Authentication

All API requests require authentication using an API key. Include your API key in the `Authorization` header:

**Bash (.sh)**
```bash
curl "https://api.fluxions.ai/endpoint" \
  -H "Authorization: YOUR_API_KEY"
```

**Python (.py)**
```python
import requests

headers = {'Authorization': 'YOUR_API_KEY'}
response = requests.get('https://api.fluxions.ai/endpoint', headers=headers)
data = response.json()
```

**JavaScript (.js)**
```javascript
const response = await fetch('https://api.fluxions.ai/endpoint', {
  headers: {'Authorization': 'YOUR_API_KEY'}
});
const data = await response.json();
```

**Important**: Do not use the "Bearer " prefix. Include the API key directly in the Authorization header.

## Base URL

```
https://api.fluxions.ai
```

## GET /health — Health Check

Check the API status and version information. *No authentication required.*

### Request

**Bash (.sh)**
```bash
curl "https://api.fluxions.ai/health"
```

**Python (.py)**
```python
import requests

response = requests.get('https://api.fluxions.ai/health')
data = response.json()
print(f"Status: {data['status']}, Model: {data['model']}")
```

**JavaScript (.js)**
```javascript
const response = await fetch('https://api.fluxions.ai/health');
const data = await response.json();
console.log(`Status: ${data.status}, Model: ${data.model}`);
```

### Response

```json
{
  "status": "ok",
  "gateway": "api.fluxions.ai"
}
```



---

# Transcription

# Transcription

Our **akro-v1** model is a comprehensive listening model that performs:

- **Transcription** — Convert speech to text with high accuracy
- **Speaker Diarization** — Identify and separate different speakers ("who said what")
- **Non-Speech Detection** — Capture breathing, laughter, hesitation, and other contextual sounds

This makes it ideal for transcribing meetings, interviews, podcasts, and any audio where understanding the full context matters.

All transcription endpoints require authentication — see [Overview](#overview) for API key setup.

**Pricing:** $0.20 per hour of audio processed, billed by the second. See [pricing](/pricing).

## POST /submit — Submit Transcription

Submit audio for processing and receive a job ID immediately. Poll `/transcriptions/{id}` for results including transcription, speaker diarization, and non-speech events.

### Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `non_speech` | boolean | `false` | Include non-speech sounds |
| `filename` | string | `"audio"` | Name for the uploaded file |
| `cache` | boolean | `true` | Use cached results for identical files |

### Request

Body: raw audio file bytes.

**Bash (.sh)**
```bash
curl -X POST "https://api.fluxions.ai/akro/submit" \
  -H "Authorization: YOUR_API_KEY" \
  -H "Content-Type: audio/mpeg" \
  --data-binary @audio.mp3
```

**Python (.py)**
```python
import requests

with open('audio.mp3', 'rb') as f:
    response = requests.post(
        'https://api.fluxions.ai/akro/submit',
        headers={'Authorization': 'YOUR_API_KEY'},
        data=f
    )

job = response.json()
job_id = job['id']
```

**JavaScript (.js)**
```javascript
const formData = new FormData();
formData.append('file', audioFile);

const response = await fetch('https://api.fluxions.ai/akro/submit', {
  method: 'POST',
  headers: {'Authorization': 'YOUR_API_KEY'},
  body: formData
});

const job = await response.json();
const jobId = job.id;
```

### Response

```json
{
  "id": 124,
  "status": "submitted",
  "created_at": "2025-10-24T10:35:00.000Z",
  "original_audio_url": "https://...",
  "query_urls": {
    "get": "https://api.fluxions.ai/transcriptions/124",
    "status": "https://api.fluxions.ai/transcriptions/124"
  },
  "cached": false
}
```

### Workflow

1. Submit audio via `/submit` and receive job ID
2. Poll `/transcriptions/{id}` to check status
3. When `status` is `"completed"`, retrieve full results

## GET /transcriptions/{id} — Get Transcription Results

Retrieve the full results for a specific job: transcription, speaker diarization, and non-speech events.

### Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `word_level_timestamps` | boolean | `false` | Include word-level timestamps in segments |

### Request

**Bash (.sh)**
```bash
curl "https://api.fluxions.ai/transcriptions/124" \
  -H "Authorization: YOUR_API_KEY"
```

**Python (.py)**
```python
import requests

response = requests.get(
    'https://api.fluxions.ai/transcriptions/124',
    headers={'Authorization': 'YOUR_API_KEY'}
)

result = response.json()
if result['status'] == 'completed':
    print(result['text'])
```

**JavaScript (.js)**
```javascript
const response = await fetch(
  'https://api.fluxions.ai/transcriptions/124',
  {
    headers: {'Authorization': 'YOUR_API_KEY'}
  }
);

const result = await response.json();
if (result.status === 'completed') {
  console.log(result.text);
}
```

### Response

```json
{
  "id": 124,
  "status": "completed",
  "created_at": "2025-10-24T10:35:00.000Z",
  "updated_at": "2025-10-24T10:35:20.000Z",
  "filename": "interview.mp3",
  "audio_duration": 300.0,
  "audio_format": "opus",
  "processing_time": 245.5,
  "language": "en",
  "non_speech": false,
  "num_chunks": 11,
  "num_segments": 25,
  "num_speakers": 2,
  "text": "SPEAKER_0: Yeah, let's actually start off exactly, where we initially began.\nSPEAKER_1: Sounds perfect. That makes complete sense to me.\nSPEAKER_0: So I started thinking about what if this is just a construct?",
  "segments": [
    {
      "speaker": "0",
      "text": "Yeah, let's actually start off exactly, where we initially began.",
      "start": 0.86,
      "end": 6.42,
      "segment_idx": 0
    },
    {
      "speaker": "1",
      "text": "Sounds perfect",
      "start": 6.0,
      "end": 7.2,
      "segment_idx": 0
    },
    {
      "speaker": "1",
      "text": "That makes complete sense to me.",
      "start": 7.5,
      "end": 9.8,
      "segment_idx": 1
    }
  ],
  "audio_url": "https://...r2.cloudflarestorage.com/...",
  "cached": true
}
```

### Status Values

- `submitted` — Job has been submitted
- `processing` — Transcription in progress
- `completed` — Transcription finished successfully
- `failed` — Transcription failed (check `error_message`)

## GET /transcriptions — List Transcriptions

List all transcriptions for your account.

### Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `limit` | integer | `50` | Number of results per page (max: 100) |
| `offset` | integer | `0` | Pagination offset |

### Request

**Bash (.sh)**
```bash
curl "https://api.fluxions.ai/transcriptions?limit=10&offset=0" \
  -H "Authorization: YOUR_API_KEY"
```

**Python (.py)**
```python
import requests

response = requests.get(
    'https://api.fluxions.ai/transcriptions',
    headers={'Authorization': 'YOUR_API_KEY'},
    params={'limit': 10, 'offset': 0}
)

data = response.json()
print(f"Total: {data['total']}, Found: {len(data['transcriptions'])} transcriptions")
for t in data['transcriptions']:
    print(f"  ID {t['id']}: {t['filename']} - {t['status']}")
```

**JavaScript (.js)**
```javascript
const response = await fetch(
  'https://api.fluxions.ai/transcriptions?limit=10&offset=0',
  {
    headers: {'Authorization': 'YOUR_API_KEY'}
  }
);

const data = await response.json();
console.log(`Total: ${data.total}, Found: ${data.transcriptions.length} transcriptions`);
data.transcriptions.forEach(t => {
  console.log(`  ID ${t.id}: ${t.filename} - ${t.status}`);
});
```

### Response

```json
{
  "total": 150,
  "limit": 10,
  "offset": 0,
  "transcriptions": [
    {
      "id": 150,
      "status": "completed",
      "created_at": "2025-10-24T10:40:00.000Z",
      "filename": "interview.mp3",
      "audio_duration": 1800.0,
      "audio_format": "opus",
      "processing_time": 45.2,
      "num_speakers": 2,
      "num_segments": 142,
      "original_audio_url": "https://...",
      "language": "en"
    }
  ]
}
```

## Response Format

### Text Field

The `text` field contains the full transcription with speaker labels and optional non-speech events:

- **Speaker Labels**: `SPEAKER_0:`, `SPEAKER_1:`, etc. prefix each speaker's utterances
- **Line Breaks**: Newlines (`\n`) separate different speaker turns
- **Non-speech Events**: When enabled, events like `[breath]`, `[pause]` appear inline

**Example**:
```
SPEAKER_0: Yeah, let's start [breath] where we began.
SPEAKER_1: Sounds good. That makes sense.
SPEAKER_0: So I was thinking about [pause] what if this is a construct?
```

### Segments Array

The `segments` array provides precise timing and speaker information for each utterance:

- **speaker**: Speaker ID as a string (`"0"`, `"1"`, etc.)
- **text**: The spoken text for this segment (without non-speech events)
- **start**: Start time in seconds (decimal precision)
- **end**: End time in seconds (decimal precision)
- **segment_idx**: Sequential index for this segment

## Non-Speech Events

When `non_speech=true`, our listening model captures various non-speech sounds and events that provide additional context to the conversation.

### Common Non-Speech Sounds

| Event | Tag | Description | Example Usage |
|-------|-----|-------------|---------------|
| **Breath** | `[breath]` | Audible breathing sounds | `...end of sentence. [breath] Now this is important.` |
| **Laugh** | `[laugh]` or `hahaha` | Laughter - can be written as text or tagged for longer laughs | `Oh wow! hahaha [breath] that's hilarious.` |
| **Hesitation** | `[hesitation]` or `[hesitate]` | Unclear thinking noises or mouth sounds while pausing - not specific words | `Well [hesitation] um I'm not really sure.` |
| **Pause** | `[pause]` | Unnaturally long, noticeable pause (e.g., looking something up) | `Let me just uh... [pause] Let me look this up.` |
| **Environment** | `[env]` | Background noise or environmental sounds | `I was thinking [env] about what you said.` |
| **Tut** | `[tut]` | Tongue click or lip smack sound | `[tut] That's not quite right.` |
| **Sigh** | `[sigh]` | Expressive exhale sound | `[sigh] I suppose you're right.` |
| **Sniff** | `[sniff]` | Nasal inhale or sniffing sound | `[sniff] Something smells good in here.` |
| **Cough** | `[cough]` | Coughing sound | `Sorry, excuse me [cough] as I was saying...` |

### Usage Notes

- Non-speech events are placed inline with the transcribed text
- Events appear at their natural position in the conversation flow
- Word elongation is marked with ellipsis: `um... so... I think...`
- Emphasis on words uses asterisks: `I *really* think so`



---

# Speech

# Speech

Hosted **VUI** — expressive, low-latency text-to-speech. Send text, get back audio in a natural voice, with support for non-verbal cues like `[sigh]` and `[laugh]`.

Two ways to render text:

- **HTTP** (`POST /v1/tts`) — one request, one render. Simplest to integrate.
- **WebSocket** (`/v1/tts/ws`) — keep a warm socket open across renders so each one skips the TLS/TCP handshake and reaches first audio sooner. Use this for interactive UIs.

**Pricing:** $10 per 1M characters (≈ $0.45 per hour of audio). See [pricing](/pricing).

## Base URL

Speech is served through the unified Fluxions API gateway under the `/vui` namespace:

```
https://api.fluxions.ai/vui
```

## Authentication

Built-in voices are **public** — no API key needed. A private voice you've cloned requires your credential in the `Authorization` header (`Bearer <token>`). See [Voices](#voices) below.

## GET /voices — List Voices

List the built-in voices available to everyone. *No authentication required.*

### Request

**Bash (.sh)**
```bash
curl "https://api.fluxions.ai/vui/voices"
```

**Python (.py)**
```python
import requests

voices = requests.get('https://api.fluxions.ai/vui/voices').json()['voices']
for v in voices:
    print(v['voice_id'], '—', v['preview_text'][:50])
```

**JavaScript (.js)**
```javascript
const { voices } = await fetch('https://api.fluxions.ai/vui/voices').then(r => r.json());
voices.forEach(v => console.log(v.voice_id, '—', v.preview_text.slice(0, 50)));
```

### Response

```json
{
  "voices": [
    { "voice_id": "maeve.h736bab09a", "preview_text": "I just, I want you to know how proud I am of you..." },
    { "voice_id": "abraham.h736bab09a", "preview_text": "I've finished analysing the document you uploaded..." },
    { "voice_id": "harry.h736bab09a", "preview_text": "Hello, this is Harry. I'm calling you..." }
  ]
}
```

Pass any `voice_id` as the `voice` field when rendering.

## POST /v1/tts — Render (HTTP)

Synthesize speech from text. Returns a complete WAV by default, or streams audio chunk-by-chunk when `stream=1`.

### Parameters

JSON body:

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `voice` | string | *(required)* | A `voice_id` from `GET /voices` |
| `input` | string | *(required)* | Text to speak. Supports non-verbal cues (see below) |
| `temperature` | float | `0.9` | Sampling temperature — higher is more varied |
| `response_format` | string | `"wav"` | `"wav"` (complete file) or `"pcm"` (raw s16le @ 24 kHz) |
| `stream` | boolean | `false` | Stream audio as it's generated instead of buffering the whole file |
| `max_secs` | float | *(auto)* | Hard ceiling on output length. Auto-estimated from text length when omitted |
| `verify_chunks` | boolean | `true` | Re-checks each rendered chunk with a fast speech-to-text pass and re-renders any that misread the text. Improves reliability at the cost of latency. Set `false` for the lowest-latency stream (see [Streaming](#streaming)) |

### Request

**Bash (.sh)**
```bash
curl -X POST "https://api.fluxions.ai/vui/v1/tts" \
  -H "Content-Type: application/json" \
  -d '{"voice": "maeve.h736bab09a", "input": "[sigh] fine, I will say it one more time."}' \
  --output speech.wav
```

**Python (.py)**
```python
import requests

r = requests.post(
    'https://api.fluxions.ai/vui/v1/tts',
    json={'voice': 'maeve.h736bab09a', 'input': '[sigh] fine, I will say it one more time.'}
)
with open('speech.wav', 'wb') as f:
    f.write(r.content)
```

**JavaScript (.js)**
```javascript
const r = await fetch('https://api.fluxions.ai/vui/v1/tts', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ voice: 'maeve.h736bab09a', input: '[sigh] fine, I will say it one more time.' })
});
const wav = await r.blob();
const url = URL.createObjectURL(wav);
new Audio(url).play();
```

### Response

`200 OK` with the audio bytes. `Content-Type` is `audio/wav` (or `audio/L16` when `response_format` is `"pcm"`).

### Streaming

Add `stream=1` (query param or body field) to receive audio as it's generated, delivered as chunked transfer encoding.

By default (`verify_chunks: true`) each chunk is checked — and re-rendered if it misreads the text — *before* it streams, so the first audio lands once the first chunk is rendered and verified (~1 s for a typical sentence). Set `verify_chunks: false` to stream each chunk the instant the model produces it, unverified: first bytes then land within ~80 ms.

**Bash (.sh)**
```bash
curl -X POST "https://api.fluxions.ai/vui/v1/tts?stream=1" \
  -H "Content-Type: application/json" \
  -d '{"voice": "maeve.h736bab09a", "input": "Streaming starts playing almost immediately."}' \
  --output speech.wav
```

**Python (.py)**
```python
import requests

with requests.post(
    'https://api.fluxions.ai/vui/v1/tts?stream=1',
    json={'voice': 'maeve.h736bab09a', 'input': 'Streaming starts playing almost immediately.'},
    stream=True,
) as r, open('speech.wav', 'wb') as f:
    for chunk in r.iter_content(chunk_size=8192):
        f.write(chunk)
```

**JavaScript (.js)**
```javascript
const r = await fetch('https://api.fluxions.ai/vui/v1/tts?stream=1', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ voice: 'maeve.h736bab09a', input: 'Streaming starts playing almost immediately.' }),
});
const reader = r.body.getReader();
for (;;) {
  const { done, value } = await reader.read();
  if (done) break;
  // `value` is a Uint8Array chunk of the streaming WAV — append or play as it arrives
}
```

## WebSocket /v1/tts/ws — Render (warm socket)

Identical render logic to `POST /v1/tts`, but the socket stays open between renders. Hold it open and the TLS/TCP/tunnel handshake is paid once — each subsequent `speak` goes straight to synthesis. Ideal for typing UIs or back-to-back lines.

Audio is delivered as **binary frames of s16le PCM, mono, 24 kHz** (no WAV header — assemble it yourself if you need a file).

### Protocol

**Client → server** (text JSON):

```json
{ "type": "speak", "voice": "<id>", "input": "<text>", "temperature": 0.9, "max_secs": 0, "verify_chunks": true, "token": "Bearer <jwt>" }
{ "type": "session.close" }
```

`temperature`, `max_secs`, and `verify_chunks` are optional. `verify_chunks` defaults to `true`; set it `false` for the lowest-latency stream (see [Streaming](#streaming)).

**Authentication.** Built-in voices are public — omit `token`. A private cloned voice needs `token` set to the *same value you'd put in the `Authorization` header*: `Bearer <clerk-jwt>` for a signed-in session, or your raw API key. It rides in the `speak` message because browsers can't set headers on a WebSocket. The token is checked per `speak`, so you can mix public and private voices on one socket.

**Server → client:**

| Message | Meaning |
|---------|---------|
| `{"type": "start"}` | The worker stream opened — audio frames follow |
| *(binary frame)* | A chunk of s16le PCM @ 24 kHz |
| `{"type": "done"}` | Current render finished — socket stays open for the next `speak` |
| `{"type": "error", "message": "..."}` | Render failed (socket stays open) |

One render = one `speak` → `start` → binary PCM* → `done`. Send another `speak` on the same socket to render again.

### Request

**JavaScript (.js)**
```javascript
const ws = new WebSocket('wss://api.fluxions.ai/vui/v1/tts/ws');
ws.binaryType = 'arraybuffer';

const chunks = [];
ws.addEventListener('open', () => {
  ws.send(JSON.stringify({ type: 'speak', voice: 'maeve.h736bab09a', input: '[laugh] oh, you are serious?' }));
});
ws.addEventListener('message', (ev) => {
  if (typeof ev.data === 'string') {
    const m = JSON.parse(ev.data);
    if (m.type === 'done') {
      // chunks now hold the full s16le PCM @ 24 kHz — feed to WebAudio or wrap in a WAV
      ws.send(JSON.stringify({ type: 'session.close' }));
    }
    return;
  }
  chunks.push(new Int16Array(ev.data)); // raw PCM frame
});
```

**Python (.py)**
```python
import asyncio, json, websockets

async def render(text, voice='maeve.h736bab09a'):
    pcm = bytearray()
    async with websockets.connect('wss://api.fluxions.ai/vui/v1/tts/ws') as ws:
        await ws.send(json.dumps({'type': 'speak', 'voice': voice, 'input': text}))
        async for msg in ws:
            if isinstance(msg, bytes):
                pcm += msg                      # s16le PCM @ 24 kHz
            elif json.loads(msg)['type'] == 'done':
                break
    return bytes(pcm)

audio = asyncio.run(render('[sigh] so you want to force me to say things.'))
```

## Non-Verbal Cues

Wrap a cue in square brackets inside `input` and the model renders it as an expressive sound rather than reading the word aloud:

| Cue | Effect |
|-----|--------|
| `[sigh]` | Audible sigh |
| `[laugh]` | Laughter |
| `[gasp]` | Sharp intake of breath |
| `[sniff]` | Sniffle |
| `[cough]` | Cough |
| `[hesitate]` | Filler / thinking sound |

**Example**: `"[gasp] you did NOT just put pineapple on that pizza! [laugh] okay, okay."`

## Voices

Built-in voices (`GET /voices`) are public. You can also **clone** a custom voice from a short reference clip. Cloned voices are private to your account and require your `Authorization` token on every render — pass it as the `Bearer <token>` header for HTTP, or in the `token` field for the WebSocket.

### POST /v1/voices — Clone a Voice

Upload a reference clip plus its transcript; the model encodes a private voice you can render with. *Requires authentication.* Sent as `multipart/form-data`.

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `audio` | file | yes | Reference clip (`wav`/`opus`/etc.). A few clean seconds is enough. Max 25 MB |
| `text` | string | no | Exact transcript of the reference clip. **Omit it and we transcribe the clip for you** before cloning |
| `name` | string | no | Display label (defaults to the filename) |

> Leave `text` out and the server runs your clip through transcription automatically — so the simplest clone is just an `audio` file. Pass `text` yourself when you want exact control over the transcript.

**Bash (.sh)**
```bash
curl -X POST "https://api.fluxions.ai/vui/v1/voices" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "audio=@reference.wav" \
  -F "text=This is exactly what the reference clip says." \
  -F "name=My Voice"
```

**Python (.py)**
```python
import requests

r = requests.post(
    'https://api.fluxions.ai/vui/v1/voices',
    headers={'Authorization': 'Bearer YOUR_TOKEN'},
    data={'text': 'This is exactly what the reference clip says.', 'name': 'My Voice'},
    files={'audio': ('reference.wav', open('reference.wav', 'rb'), 'audio/wav')},
)
voice_id = r.json()['voice_id']
```

**JavaScript (.js)**
```javascript
const fd = new FormData();
fd.append('audio', fileInput.files[0]);
fd.append('text', 'This is exactly what the reference clip says.');
fd.append('name', 'My Voice');

const { voice_id } = await fetch('https://api.fluxions.ai/vui/v1/voices', {
  method: 'POST',
  headers: { Authorization: 'Bearer YOUR_TOKEN' },
  body: fd,
}).then(r => r.json());
```

Response: `{ "voice_id": "u-<user>-<hash>", "name": "My Voice", "frames": 173, "seconds": 13.8 }`. Pass the returned `voice_id` as `voice` in any render call (with your token).

### GET /v1/voices/mine — List Your Cloned Voices

**Bash (.sh)**
```bash
curl "https://api.fluxions.ai/vui/v1/voices/mine" \
  -H "Authorization: Bearer YOUR_TOKEN"
```

**Python (.py)**
```python
import requests

voices = requests.get(
    'https://api.fluxions.ai/vui/v1/voices/mine',
    headers={'Authorization': 'Bearer YOUR_TOKEN'},
).json()['voices']
```

**JavaScript (.js)**
```javascript
const { voices } = await fetch('https://api.fluxions.ai/vui/v1/voices/mine', {
  headers: { Authorization: 'Bearer YOUR_TOKEN' },
}).then(r => r.json());
```

Returns `{ "voices": [ { "voice_id": "u-...-ab12cd34", "name": "My Voice" } ] }`.

### POST /v1/voices/delete — Remove a Cloned Voice

**Bash (.sh)**
```bash
curl -X POST "https://api.fluxions.ai/vui/v1/voices/delete" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"voice_id": "u-...-ab12cd34"}'
```

**Python (.py)**
```python
import requests

requests.post(
    'https://api.fluxions.ai/vui/v1/voices/delete',
    headers={'Authorization': 'Bearer YOUR_TOKEN'},
    json={'voice_id': 'u-...-ab12cd34'},
)
```

**JavaScript (.js)**
```javascript
await fetch('https://api.fluxions.ai/vui/v1/voices/delete', {
  method: 'POST',
  headers: { Authorization: 'Bearer YOUR_TOKEN', 'Content-Type': 'application/json' },
  body: JSON.stringify({ voice_id: 'u-...-ab12cd34' }),
});
```

> **Note:** cloned voices currently live in the running server's memory, not a database — they're tied to your account but are not guaranteed to survive a server restart. Re-upload if a `voice_id` stops resolving.

## Output Format

- **Sample rate**: 24,000 Hz
- **Channels**: mono
- **Sample format**: signed 16-bit little-endian PCM
- **HTTP `wav`**: PCM wrapped in a standard WAV container
- **HTTP `pcm`** / **WebSocket binary frames**: raw s16le PCM (no header)



---

# History

# History

The History API is one read-only surface over everything you've done on the platform — **transcriptions**, **TTS renders**, and **voice conversations** — under a single host. Use it to list, page, filter, and search your activity, and to fetch download links for the underlying audio and transcripts.

All history endpoints require authentication — see [Overview](#overview) for API key setup.

**Base URL:** `https://api.fluxions.ai`

## One shape for everything

Every **list** response uses the same envelope:

```json
{
  "object": "list",
  "page": 1,
  "limit": 20,
  "total": 137,
  "has_more": true,
  "data": [ /* items */ ]
}
```

Every **item** carries an `object` field telling you its type (`"transcription"`, `"tts"`, or `"conversation"`) plus its native `id`. To fetch one item's detail, call `/history/{type}s/{id}` (e.g. `/history/tts/123`). Timestamps are ISO-8601 UTC; costs are in US dollars.

## Shared query parameters

These work on every collection (and the unified feed):

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `page` | integer | `1` | Page number (1-based) |
| `limit` | integer | `20` | Results per page (max: 100) |
| `order` | string | `desc` | Sort by time: `asc` or `desc` |
| `since` | string | — | Only items at/after this time (ISO-8601 or epoch seconds) |
| `until` | string | — | Only items at/before this time (ISO-8601 or epoch seconds) |

Collection-specific filters: `voice` (tts, conversations), `status` (transcriptions), `type` (the unified feed).

## GET /history — Unified Feed

A merged, reverse-chronological feed across all three types. Filter the streams with `type` (comma-separated).

### Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `type` | string | *(all)* | Restrict to `transcription`, `tts`, and/or `conversation` (csv) |

*(plus all shared parameters above)*

### Request

**Bash (.sh)**
```bash
curl "https://api.fluxions.ai/history?limit=10&type=tts,conversation" \
  -H "Authorization: YOUR_API_KEY"
```

**Python (.py)**
```python
import requests

r = requests.get(
    'https://api.fluxions.ai/history',
    headers={'Authorization': 'YOUR_API_KEY'},
    params={'limit': 10, 'type': 'tts,conversation'},
)
for item in r.json()['data']:
    print(item['object'], item['id'], item['created_at'])
```

**JavaScript (.js)**
```javascript
const res = await fetch('https://api.fluxions.ai/history?limit=10&type=tts,conversation', {
  headers: {'Authorization': 'YOUR_API_KEY'}
});
const { data } = await res.json();
data.forEach(i => console.log(i.object, i.id, i.created_at));
```

### Response

```json
{
  "object": "list",
  "page": 1, "limit": 10, "total": 84, "has_more": true,
  "data": [
    { "object": "conversation", "id": "sess_abc", "created_at": "2026-06-29T10:40:00Z",
      "cost_usd": null, "voice": "maeve.en-us", "duration_secs": 312.4, "turn_count": 18 },
    { "object": "tts", "id": 123, "created_at": "2026-06-29T10:32:00Z",
      "cost_usd": 0.0123, "voice": "maeve.en-us", "chars": 842, "audio_secs": 58.4 }
  ]
}
```

> The feed is lightweight: it does **not** include presigned `download_url`s. Use the typed collection or detail endpoints to get them.

## GET /history/transcriptions — Transcription History

List your transcriptions.

### Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `status` | string | — | Filter by status (e.g. `completed`) |
| `include_download_url` | boolean | `false` | Include a presigned audio URL per item |

*(plus all shared parameters)*

### Request

**Bash (.sh)**
```bash
curl "https://api.fluxions.ai/history/transcriptions?status=completed&limit=5" \
  -H "Authorization: YOUR_API_KEY"
```

**Python (.py)**
```python
import requests

r = requests.get(
    'https://api.fluxions.ai/history/transcriptions',
    headers={'Authorization': 'YOUR_API_KEY'},
    params={'status': 'completed', 'limit': 5},
)
print(r.json()['total'], 'transcriptions')
```

**JavaScript (.js)**
```javascript
const res = await fetch('https://api.fluxions.ai/history/transcriptions?status=completed&limit=5', {
  headers: {'Authorization': 'YOUR_API_KEY'}
});
console.log((await res.json()).total, 'transcriptions');
```

### Response

```json
{
  "object": "list",
  "page": 1, "limit": 5, "total": 42, "has_more": true,
  "data": [
    {
      "object": "transcription",
      "id": 456,
      "created_at": "2026-06-29T10:35:00Z",
      "cost_usd": 0.10,
      "status": "completed",
      "filename": "interview.mp3",
      "audio_duration_secs": 1800.0,
      "audio_format": "opus",
      "language": "en",
      "num_speakers": 2,
      "num_segments": 142
    }
  ]
}
```

## GET /history/transcriptions/{id} — One Transcription

Returns the full record with presigned `download_url` (audio), `text_url`, and `segments_url`. `404` if it isn't yours.

```bash
curl "https://api.fluxions.ai/history/transcriptions/456" \
  -H "Authorization: YOUR_API_KEY"
```

## GET /history/tts — TTS Render History

List your text-to-speech renders.

### Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `voice` | string | — | Filter by voice id |
| `include_download_url` | boolean | `true` | Include a presigned Opus URL per item |

*(plus all shared parameters)*

### Request

**Bash (.sh)**
```bash
curl "https://api.fluxions.ai/history/tts?voice=maeve.en-us&limit=10" \
  -H "Authorization: YOUR_API_KEY"
```

**Python (.py)**
```python
import requests

r = requests.get(
    'https://api.fluxions.ai/history/tts',
    headers={'Authorization': 'YOUR_API_KEY'},
    params={'voice': 'maeve.en-us', 'limit': 10},
)
for render in r.json()['data']:
    print(render['id'], render['chars'], render['download_url'])
```

**JavaScript (.js)**
```javascript
const res = await fetch('https://api.fluxions.ai/history/tts?voice=maeve.en-us&limit=10', {
  headers: {'Authorization': 'YOUR_API_KEY'}
});
const { data } = await res.json();
data.forEach(r => console.log(r.id, r.chars, r.download_url));
```

### Response

```json
{
  "object": "list",
  "page": 1, "limit": 10, "total": 60, "has_more": true,
  "data": [
    {
      "object": "tts",
      "id": 123,
      "created_at": "2026-06-29T10:32:00Z",
      "cost_usd": 0.0123,
      "voice": "maeve.en-us",
      "chars": 842,
      "audio_secs": 58.4,
      "download_url": "https://...r2.cloudflarestorage.com/...opus"
    }
  ]
}
```

## GET /history/tts/{id} — One Render

Returns one render with a fresh signed `download_url`. `404` if it isn't yours.

```bash
curl "https://api.fluxions.ai/history/tts/123" \
  -H "Authorization: YOUR_API_KEY"
```

## GET /history/conversations — Conversation History

List your voice conversations (agent calls).

### Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `voice` | string | — | Filter by voice id |

*(plus all shared parameters)*

### Request

**Bash (.sh)**
```bash
curl "https://api.fluxions.ai/history/conversations?limit=10" \
  -H "Authorization: YOUR_API_KEY"
```

**Python (.py)**
```python
import requests

r = requests.get(
    'https://api.fluxions.ai/history/conversations',
    headers={'Authorization': 'YOUR_API_KEY'},
    params={'limit': 10},
)
for c in r.json()['data']:
    print(c['id'], c['turn_count'], c['duration_secs'])
```

**JavaScript (.js)**
```javascript
const res = await fetch('https://api.fluxions.ai/history/conversations?limit=10', {
  headers: {'Authorization': 'YOUR_API_KEY'}
});
const { data } = await res.json();
data.forEach(c => console.log(c.id, c.turn_count, c.duration_secs));
```

### Response

```json
{
  "object": "list",
  "page": 1, "limit": 10, "total": 23, "has_more": true,
  "data": [
    {
      "object": "conversation",
      "id": "sess_abc123",
      "created_at": "2026-06-29T10:40:00Z",
      "cost_usd": null,
      "voice": "maeve.en-us",
      "started_at": "2026-06-29T10:40:00Z",
      "ended_at": "2026-06-29T10:45:12Z",
      "duration_secs": 312.4,
      "turn_count": 18
    }
  ]
}
```

## GET /history/conversations/{id} — One Conversation

Returns the session plus its turn-by-turn transcript. `404` if it isn't yours.

### Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `include_turns` | boolean | `true` | Include the transcript turns |
| `include_tool_calls` | boolean | `false` | Include tool invocations (calendar, email, …) |
| `turns_limit` | integer | `500` | Max turns to return (max: 2000) |

### Request

```bash
curl "https://api.fluxions.ai/history/conversations/sess_abc123?include_tool_calls=true" \
  -H "Authorization: YOUR_API_KEY"
```

### Response

```json
{
  "object": "conversation",
  "id": "sess_abc123",
  "created_at": "2026-06-29T10:40:00Z",
  "voice": "maeve.en-us",
  "duration_secs": 312.4,
  "turn_count": 18,
  "turns": [
    { "object": "conversation_turn", "id": 9001, "session_id": "sess_abc123",
      "role": "user", "text": "What's on my calendar today?", "created_at": "2026-06-29T10:40:05Z" },
    { "object": "conversation_turn", "id": 9002, "session_id": "sess_abc123",
      "role": "assistant", "text": "You have two meetings...", "created_at": "2026-06-29T10:40:08Z" }
  ],
  "tool_calls": [
    { "object": "tool_call", "id": 51, "tool": "calendar",
      "args": {"range": "today"}, "result": "2 events", "created_at": "2026-06-29T10:40:07Z" }
  ]
}
```

## GET /history/conversations/search — Search Turns

Full-text search across your conversation turns (Postgres `websearch_to_tsquery`).

### Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `q` | string | *(required)* | Search query |
| `limit` | integer | `20` | Max results (max: 100) |

### Request

```bash
curl "https://api.fluxions.ai/history/conversations/search?q=dentist+appointment" \
  -H "Authorization: YOUR_API_KEY"
```

### Response

```json
{
  "object": "list",
  "query": "dentist appointment",
  "data": [
    { "object": "conversation_turn", "id": 9100, "session_id": "sess_def456",
      "role": "user", "text": "remind me about the dentist appointment",
      "created_at": "2026-06-20T14:02:00Z" }
  ]
}
```

## GET /history/search — Cross-Domain Search

Search across your whole history in one call. Conversation turns are matched by full text; transcriptions are matched by filename (their text lives in object storage, not the database). Results are type-tagged via `object`.

### Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `q` | string | *(required)* | Search query |
| `limit` | integer | `20` | Max results per domain (max: 100) |

### Request

```bash
curl "https://api.fluxions.ai/history/search?q=interview" \
  -H "Authorization: YOUR_API_KEY"
```

### Response

```json
{
  "object": "list",
  "query": "interview",
  "data": [
    { "object": "conversation_turn", "id": 9200, "session_id": "sess_ghi",
      "role": "assistant", "text": "...the interview went well...", "created_at": "2026-06-25T09:00:00Z" },
    { "object": "transcription", "id": 456, "created_at": "2026-06-29T10:35:00Z",
      "status": "completed", "filename": "interview.mp3", "audio_duration_secs": 1800.0 }
  ]
}
```


