Overview
Overview
Welcome to the Fluxions API. Our hosted endpoints cover three product surfaces:
- Transcription —
akro-v1, our listening model: speech-to-text, speaker diarization, and non-speech events (breaths, laughter, hesitations) in one call. Production-ready today. - Text-to-Speech — hosted VUI for expressive, low-latency TTS over HTTP or WebSocket. Live today — see Speech.
- Realtime Voice — OpenAI Realtime-compatible WebSocket for end-to-end streaming voice conversations. Coming soon.
This page covers the basics that apply across all surfaces: authentication, base URL, and a health check.
Authentication
All API requests require authentication using an API key. Include your API key in the Authorization header:
curl "https://api.fluxions.ai/endpoint" \-H "Authorization: YOUR_API_KEY"
import requestsheaders = {'Authorization': 'YOUR_API_KEY'}response = requests.get('https://api.fluxions.ai/endpoint', headers=headers)data = response.json()
const response = await fetch('https://api.fluxions.ai/endpoint', {headers: {'Authorization': 'YOUR_API_KEY'}});const data = await response.json();
Important: Do not use the "Bearer " prefix. Include the API key directly in the Authorization header.
Base URL
https://api.fluxions.ai
GET /health — Health Check
Check the API status and version information. No authentication required.
Request
curl "https://api.fluxions.ai/health"
import requestsresponse = requests.get('https://api.fluxions.ai/health')data = response.json()print(f"Status: {data['status']}, Model: {data['model']}")
const response = await fetch('https://api.fluxions.ai/health');const data = await response.json();console.log(`Status: ${data.status}, Model: ${data.model}`);
Response
{"status": "ok","gateway": "api.fluxions.ai"}
Transcription
Transcription
Our akro-v1 model is a comprehensive listening model that performs:
- Transcription — Convert speech to text with high accuracy
- Speaker Diarization — Identify and separate different speakers ("who said what")
- Non-Speech Detection — Capture breathing, laughter, hesitation, and other contextual sounds
This makes it ideal for transcribing meetings, interviews, podcasts, and any audio where understanding the full context matters.
All transcription endpoints require authentication — see Overview for API key setup.
Pricing: $0.20 per hour of audio processed, billed by the second. See pricing.
POST /submit — Submit Transcription
Submit audio for processing and receive a job ID immediately. Poll /transcriptions/{id} for results including transcription, speaker diarization, and non-speech events.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
non_speech | boolean | false | Include non-speech sounds |
filename | string | "audio" | Name for the uploaded file |
cache | boolean | true | Use cached results for identical files |
Request
Body: raw audio file bytes.
curl -X POST "https://api.fluxions.ai/akro/submit" \-H "Authorization: YOUR_API_KEY" \-H "Content-Type: audio/mpeg" \--data-binary @audio.mp3
import requestswith open('audio.mp3', 'rb') as f:response = requests.post('https://api.fluxions.ai/akro/submit',headers={'Authorization': 'YOUR_API_KEY'},data=f)job = response.json()job_id = job['id']
const formData = new FormData();formData.append('file', audioFile);const response = await fetch('https://api.fluxions.ai/akro/submit', {method: 'POST',headers: {'Authorization': 'YOUR_API_KEY'},body: formData});const job = await response.json();const jobId = job.id;
Response
{"id": 124,"status": "submitted","created_at": "2025-10-24T10:35:00.000Z","original_audio_url": "https://...","query_urls": {"get": "https://api.fluxions.ai/transcriptions/124","status": "https://api.fluxions.ai/transcriptions/124"},"cached": false}
Workflow
- Submit audio via
/submitand receive job ID - Poll
/transcriptions/{id}to check status - When
statusis"completed", retrieve full results
GET /transcriptions/{id} — Get Transcription Results
Retrieve the full results for a specific job: transcription, speaker diarization, and non-speech events.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
word_level_timestamps | boolean | false | Include word-level timestamps in segments |
Request
curl "https://api.fluxions.ai/transcriptions/124" \-H "Authorization: YOUR_API_KEY"
import requestsresponse = requests.get('https://api.fluxions.ai/transcriptions/124',headers={'Authorization': 'YOUR_API_KEY'})result = response.json()if result['status'] == 'completed':print(result['text'])
const response = await fetch('https://api.fluxions.ai/transcriptions/124',{headers: {'Authorization': 'YOUR_API_KEY'}});const result = await response.json();if (result.status === 'completed') {console.log(result.text);}
Response
{"id": 124,"status": "completed","created_at": "2025-10-24T10:35:00.000Z","updated_at": "2025-10-24T10:35:20.000Z","filename": "interview.mp3","audio_duration": 300.0,"audio_format": "opus","processing_time": 245.5,"language": "en","non_speech": false,"num_chunks": 11,"num_segments": 25,"num_speakers": 2,"text": "SPEAKER_0: Yeah, let's actually start off exactly, where we initially began.\nSPEAKER_1: Sounds perfect. That makes complete sense to me.\nSPEAKER_0: So I started thinking about what if this is just a construct?","segments": [{"speaker": "0","text": "Yeah, let's actually start off exactly, where we initially began.","start": 0.86,"end": 6.42,"segment_idx": 0},{"speaker": "1","text": "Sounds perfect","start": 6.0,"end": 7.2,"segment_idx": 0},{"speaker": "1","text": "That makes complete sense to me.","start": 7.5,"end": 9.8,"segment_idx": 1}],"audio_url": "https://...r2.cloudflarestorage.com/...","cached": true}
Status Values
submitted— Job has been submittedprocessing— Transcription in progresscompleted— Transcription finished successfullyfailed— Transcription failed (checkerror_message)
GET /transcriptions — List Transcriptions
List all transcriptions for your account.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
limit | integer | 50 | Number of results per page (max: 100) |
offset | integer | 0 | Pagination offset |
Request
curl "https://api.fluxions.ai/transcriptions?limit=10&offset=0" \-H "Authorization: YOUR_API_KEY"
import requestsresponse = requests.get('https://api.fluxions.ai/transcriptions',headers={'Authorization': 'YOUR_API_KEY'},params={'limit': 10, 'offset': 0})data = response.json()print(f"Total: {data['total']}, Found: {len(data['transcriptions'])} transcriptions")for t in data['transcriptions']:print(f" ID {t['id']}: {t['filename']} - {t['status']}")
const response = await fetch('https://api.fluxions.ai/transcriptions?limit=10&offset=0',{headers: {'Authorization': 'YOUR_API_KEY'}});const data = await response.json();console.log(`Total: ${data.total}, Found: ${data.transcriptions.length} transcriptions`);data.transcriptions.forEach(t => {console.log(` ID ${t.id}: ${t.filename} - ${t.status}`);});
Response
{"total": 150,"limit": 10,"offset": 0,"transcriptions": [{"id": 150,"status": "completed","created_at": "2025-10-24T10:40:00.000Z","filename": "interview.mp3","audio_duration": 1800.0,"audio_format": "opus","processing_time": 45.2,"num_speakers": 2,"num_segments": 142,"original_audio_url": "https://...","language": "en"}]}
Response Format
Text Field
The text field contains the full transcription with speaker labels and optional non-speech events:
- Speaker Labels:
SPEAKER_0:,SPEAKER_1:, etc. prefix each speaker's utterances - Line Breaks: Newlines (
\n) separate different speaker turns - Non-speech Events: When enabled, events like
[breath],[pause]appear inline
Example:
SPEAKER_0: Yeah, let's start [breath] where we began.SPEAKER_1: Sounds good. That makes sense.SPEAKER_0: So I was thinking about [pause] what if this is a construct?
Segments Array
The segments array provides precise timing and speaker information for each utterance:
- speaker: Speaker ID as a string (
"0","1", etc.) - text: The spoken text for this segment (without non-speech events)
- start: Start time in seconds (decimal precision)
- end: End time in seconds (decimal precision)
- segment_idx: Sequential index for this segment
Non-Speech Events
When non_speech=true, our listening model captures various non-speech sounds and events that provide additional context to the conversation.
Common Non-Speech Sounds
| Event | Tag | Description | Example Usage |
|---|---|---|---|
| Breath | [breath] | Audible breathing sounds | ...end of sentence. [breath] Now this is important. |
| Laugh | [laugh] or hahaha | Laughter - can be written as text or tagged for longer laughs | Oh wow! hahaha [breath] that's hilarious. |
| Hesitation | [hesitation] or [hesitate] | Unclear thinking noises or mouth sounds while pausing - not specific words | Well [hesitation] um I'm not really sure. |
| Pause | [pause] | Unnaturally long, noticeable pause (e.g., looking something up) | Let me just uh... [pause] Let me look this up. |
| Environment | [env] | Background noise or environmental sounds | I was thinking [env] about what you said. |
| Tut | [tut] | Tongue click or lip smack sound | [tut] That's not quite right. |
| Sigh | [sigh] | Expressive exhale sound | [sigh] I suppose you're right. |
| Sniff | [sniff] | Nasal inhale or sniffing sound | [sniff] Something smells good in here. |
| Cough | [cough] | Coughing sound | Sorry, excuse me [cough] as I was saying... |
Usage Notes
- Non-speech events are placed inline with the transcribed text
- Events appear at their natural position in the conversation flow
- Word elongation is marked with ellipsis:
um... so... I think... - Emphasis on words uses asterisks:
I *really* think so
Speech
Speech
Hosted VUI — expressive, low-latency text-to-speech. Send text, get back audio in a natural voice, with support for non-verbal cues like [sigh] and [laugh].
Two ways to render text:
- HTTP (
POST /v1/tts) — one request, one render. Simplest to integrate. - WebSocket (
/v1/tts/ws) — keep a warm socket open across renders so each one skips the TLS/TCP handshake and reaches first audio sooner. Use this for interactive UIs.
Pricing: $10 per 1M characters (≈ $0.45 per hour of audio). See pricing.
Base URL
Speech is served through the unified Fluxions API gateway under the /vui namespace:
https://api.fluxions.ai/vui
Authentication
Built-in voices are public — no API key needed. A private voice you've cloned requires your credential in the Authorization header (Bearer <token>). See Voices below.
GET /voices — List Voices
List the built-in voices available to everyone. No authentication required.
Request
curl "https://api.fluxions.ai/vui/voices"
import requestsvoices = requests.get('https://api.fluxions.ai/vui/voices').json()['voices']for v in voices:print(v['voice_id'], '—', v['preview_text'][:50])
const { voices } = await fetch('https://api.fluxions.ai/vui/voices').then(r => r.json());voices.forEach(v => console.log(v.voice_id, '—', v.preview_text.slice(0, 50)));
Response
{"voices": [{ "voice_id": "maeve.h736bab09a", "preview_text": "I just, I want you to know how proud I am of you..." },{ "voice_id": "abraham.h736bab09a", "preview_text": "I've finished analysing the document you uploaded..." },{ "voice_id": "harry.h736bab09a", "preview_text": "Hello, this is Harry. I'm calling you..." }]}
Pass any voice_id as the voice field when rendering.
POST /v1/tts — Render (HTTP)
Synthesize speech from text. Returns a complete WAV by default, or streams audio chunk-by-chunk when stream=1.
Parameters
JSON body:
| Parameter | Type | Default | Description |
|---|---|---|---|
voice | string | (required) | A voice_id from GET /voices |
input | string | (required) | Text to speak. Supports non-verbal cues (see below) |
temperature | float | 0.9 | Sampling temperature — higher is more varied |
response_format | string | "wav" | "wav" (complete file) or "pcm" (raw s16le @ 24 kHz) |
stream | boolean | false | Stream audio as it's generated instead of buffering the whole file |
max_secs | float | (auto) | Hard ceiling on output length. Auto-estimated from text length when omitted |
verify_chunks | boolean | true | Re-checks each rendered chunk with a fast speech-to-text pass and re-renders any that misread the text. Improves reliability at the cost of latency. Set false for the lowest-latency stream (see Streaming) |
Request
curl -X POST "https://api.fluxions.ai/vui/v1/tts" \-H "Content-Type: application/json" \-d '{"voice": "maeve.h736bab09a", "input": "[sigh] fine, I will say it one more time."}' \--output speech.wav
import requestsr = requests.post('https://api.fluxions.ai/vui/v1/tts',json={'voice': 'maeve.h736bab09a', 'input': '[sigh] fine, I will say it one more time.'})with open('speech.wav', 'wb') as f:f.write(r.content)
const r = await fetch('https://api.fluxions.ai/vui/v1/tts', {method: 'POST',headers: { 'Content-Type': 'application/json' },body: JSON.stringify({ voice: 'maeve.h736bab09a', input: '[sigh] fine, I will say it one more time.' })});const wav = await r.blob();const url = URL.createObjectURL(wav);new Audio(url).play();
Response
200 OK with the audio bytes. Content-Type is audio/wav (or audio/L16 when response_format is "pcm").
Streaming
Add stream=1 (query param or body field) to receive audio as it's generated, delivered as chunked transfer encoding.
By default (verify_chunks: true) each chunk is checked — and re-rendered if it misreads the text — before it streams, so the first audio lands once the first chunk is rendered and verified (~1 s for a typical sentence). Set verify_chunks: false to stream each chunk the instant the model produces it, unverified: first bytes then land within ~80 ms.
curl -X POST "https://api.fluxions.ai/vui/v1/tts?stream=1" \-H "Content-Type: application/json" \-d '{"voice": "maeve.h736bab09a", "input": "Streaming starts playing almost immediately."}' \--output speech.wav
import requestswith requests.post('https://api.fluxions.ai/vui/v1/tts?stream=1',json={'voice': 'maeve.h736bab09a', 'input': 'Streaming starts playing almost immediately.'},stream=True,) as r, open('speech.wav', 'wb') as f:for chunk in r.iter_content(chunk_size=8192):f.write(chunk)
const r = await fetch('https://api.fluxions.ai/vui/v1/tts?stream=1', {method: 'POST',headers: { 'Content-Type': 'application/json' },body: JSON.stringify({ voice: 'maeve.h736bab09a', input: 'Streaming starts playing almost immediately.' }),});const reader = r.body.getReader();for (;;) {const { done, value } = await reader.read();if (done) break;// `value` is a Uint8Array chunk of the streaming WAV — append or play as it arrives}
WebSocket /v1/tts/ws — Render (warm socket)
Identical render logic to POST /v1/tts, but the socket stays open between renders. Hold it open and the TLS/TCP/tunnel handshake is paid once — each subsequent speak goes straight to synthesis. Ideal for typing UIs or back-to-back lines.
Audio is delivered as binary frames of s16le PCM, mono, 24 kHz (no WAV header — assemble it yourself if you need a file).
Protocol
Client → server (text JSON):
{ "type": "speak", "voice": "<id>", "input": "<text>", "temperature": 0.9, "max_secs": 0, "verify_chunks": true, "token": "Bearer <jwt>" }{ "type": "session.close" }
temperature, max_secs, and verify_chunks are optional. verify_chunks defaults to true; set it false for the lowest-latency stream (see Streaming).
Authentication. Built-in voices are public — omit token. A private cloned voice needs token set to the same value you'd put in the Authorization header: Bearer <clerk-jwt> for a signed-in session, or your raw API key. It rides in the speak message because browsers can't set headers on a WebSocket. The token is checked per speak, so you can mix public and private voices on one socket.
Server → client:
| Message | Meaning |
|---|---|
{"type": "start"} | The worker stream opened — audio frames follow |
| (binary frame) | A chunk of s16le PCM @ 24 kHz |
{"type": "done"} | Current render finished — socket stays open for the next speak |
{"type": "error", "message": "..."} | Render failed (socket stays open) |
One render = one speak → start → binary PCM* → done. Send another speak on the same socket to render again.
Request
import asyncio, json, websocketsasync def render(text, voice='maeve.h736bab09a'):pcm = bytearray()async with websockets.connect('wss://api.fluxions.ai/vui/v1/tts/ws') as ws:await ws.send(json.dumps({'type': 'speak', 'voice': voice, 'input': text}))async for msg in ws:if isinstance(msg, bytes):pcm += msg # s16le PCM @ 24 kHzelif json.loads(msg)['type'] == 'done':breakreturn bytes(pcm)audio = asyncio.run(render('[sigh] so you want to force me to say things.'))
const ws = new WebSocket('wss://api.fluxions.ai/vui/v1/tts/ws');ws.binaryType = 'arraybuffer';const chunks = [];ws.addEventListener('open', () => {ws.send(JSON.stringify({ type: 'speak', voice: 'maeve.h736bab09a', input: '[laugh] oh, you are serious?' }));});ws.addEventListener('message', (ev) => {if (typeof ev.data === 'string') {const m = JSON.parse(ev.data);if (m.type === 'done') {// chunks now hold the full s16le PCM @ 24 kHz — feed to WebAudio or wrap in a WAVws.send(JSON.stringify({ type: 'session.close' }));}return;}chunks.push(new Int16Array(ev.data)); // raw PCM frame});
Non-Verbal Cues
Wrap a cue in square brackets inside input and the model renders it as an expressive sound rather than reading the word aloud:
| Cue | Effect |
|---|---|
[sigh] | Audible sigh |
[laugh] | Laughter |
[gasp] | Sharp intake of breath |
[sniff] | Sniffle |
[cough] | Cough |
[hesitate] | Filler / thinking sound |
Example: "[gasp] you did NOT just put pineapple on that pizza! [laugh] okay, okay."
Voices
Built-in voices (GET /voices) are public. You can also clone a custom voice from a short reference clip. Cloned voices are private to your account and require your Authorization token on every render — pass it as the Bearer <token> header for HTTP, or in the token field for the WebSocket.
POST /v1/voices — Clone a Voice
Upload a reference clip plus its transcript; the model encodes a private voice you can render with. Requires authentication. Sent as multipart/form-data.
| Field | Type | Required | Description |
|---|---|---|---|
audio | file | yes | Reference clip (wav/opus/etc.). A few clean seconds is enough. Max 25 MB |
text | string | no | Exact transcript of the reference clip. Omit it and we transcribe the clip for you before cloning |
name | string | no | Display label (defaults to the filename) |
Leave
textout and the server runs your clip through transcription automatically — so the simplest clone is just anaudiofile. Passtextyourself when you want exact control over the transcript.
curl -X POST "https://api.fluxions.ai/vui/v1/voices" \-H "Authorization: Bearer YOUR_TOKEN" \-F "audio=@reference.wav" \-F "text=This is exactly what the reference clip says." \-F "name=My Voice"
import requestsr = requests.post('https://api.fluxions.ai/vui/v1/voices',headers={'Authorization': 'Bearer YOUR_TOKEN'},data={'text': 'This is exactly what the reference clip says.', 'name': 'My Voice'},files={'audio': ('reference.wav', open('reference.wav', 'rb'), 'audio/wav')},)voice_id = r.json()['voice_id']
const fd = new FormData();fd.append('audio', fileInput.files[0]);fd.append('text', 'This is exactly what the reference clip says.');fd.append('name', 'My Voice');const { voice_id } = await fetch('https://api.fluxions.ai/vui/v1/voices', {method: 'POST',headers: { Authorization: 'Bearer YOUR_TOKEN' },body: fd,}).then(r => r.json());
Response: { "voice_id": "u-<user>-<hash>", "name": "My Voice", "frames": 173, "seconds": 13.8 }. Pass the returned voice_id as voice in any render call (with your token).
GET /v1/voices/mine — List Your Cloned Voices
curl "https://api.fluxions.ai/vui/v1/voices/mine" \-H "Authorization: Bearer YOUR_TOKEN"
import requestsvoices = requests.get('https://api.fluxions.ai/vui/v1/voices/mine',headers={'Authorization': 'Bearer YOUR_TOKEN'},).json()['voices']
const { voices } = await fetch('https://api.fluxions.ai/vui/v1/voices/mine', {headers: { Authorization: 'Bearer YOUR_TOKEN' },}).then(r => r.json());
Returns { "voices": [ { "voice_id": "u-...-ab12cd34", "name": "My Voice" } ] }.
POST /v1/voices/delete — Remove a Cloned Voice
curl -X POST "https://api.fluxions.ai/vui/v1/voices/delete" \-H "Authorization: Bearer YOUR_TOKEN" \-H "Content-Type: application/json" \-d '{"voice_id": "u-...-ab12cd34"}'
import requestsrequests.post('https://api.fluxions.ai/vui/v1/voices/delete',headers={'Authorization': 'Bearer YOUR_TOKEN'},json={'voice_id': 'u-...-ab12cd34'},)
await fetch('https://api.fluxions.ai/vui/v1/voices/delete', {method: 'POST',headers: { Authorization: 'Bearer YOUR_TOKEN', 'Content-Type': 'application/json' },body: JSON.stringify({ voice_id: 'u-...-ab12cd34' }),});
Note: cloned voices currently live in the running server's memory, not a database — they're tied to your account but are not guaranteed to survive a server restart. Re-upload if a
voice_idstops resolving.
Output Format
- Sample rate: 24,000 Hz
- Channels: mono
- Sample format: signed 16-bit little-endian PCM
- HTTP
wav: PCM wrapped in a standard WAV container - HTTP
pcm/ WebSocket binary frames: raw s16le PCM (no header)
History
History
The History API is one read-only surface over everything you've done on the platform — transcriptions, TTS renders, and voice conversations — under a single host. Use it to list, page, filter, and search your activity, and to fetch download links for the underlying audio and transcripts.
All history endpoints require authentication — see Overview for API key setup.
Base URL: https://api.fluxions.ai
One shape for everything
Every list response uses the same envelope:
{"object": "list","page": 1,"limit": 20,"total": 137,"has_more": true,"data": [ /* items */ ]}
Every item carries an object field telling you its type ("transcription", "tts", or "conversation") plus its native id. To fetch one item's detail, call /history/{type}s/{id} (e.g. /history/tts/123). Timestamps are ISO-8601 UTC; costs are in US dollars.
Shared query parameters
These work on every collection (and the unified feed):
| Parameter | Type | Default | Description |
|---|---|---|---|
page | integer | 1 | Page number (1-based) |
limit | integer | 20 | Results per page (max: 100) |
order | string | desc | Sort by time: asc or desc |
since | string | — | Only items at/after this time (ISO-8601 or epoch seconds) |
until | string | — | Only items at/before this time (ISO-8601 or epoch seconds) |
Collection-specific filters: voice (tts, conversations), status (transcriptions), type (the unified feed).
GET /history — Unified Feed
A merged, reverse-chronological feed across all three types. Filter the streams with type (comma-separated).
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
type | string | (all) | Restrict to transcription, tts, and/or conversation (csv) |
(plus all shared parameters above)
Request
curl "https://api.fluxions.ai/history?limit=10&type=tts,conversation" \-H "Authorization: YOUR_API_KEY"
import requestsr = requests.get('https://api.fluxions.ai/history',headers={'Authorization': 'YOUR_API_KEY'},params={'limit': 10, 'type': 'tts,conversation'},)for item in r.json()['data']:print(item['object'], item['id'], item['created_at'])
const res = await fetch('https://api.fluxions.ai/history?limit=10&type=tts,conversation', {headers: {'Authorization': 'YOUR_API_KEY'}});const { data } = await res.json();data.forEach(i => console.log(i.object, i.id, i.created_at));
Response
{"object": "list","page": 1, "limit": 10, "total": 84, "has_more": true,"data": [{ "object": "conversation", "id": "sess_abc", "created_at": "2026-06-29T10:40:00Z","cost_usd": null, "voice": "maeve.en-us", "duration_secs": 312.4, "turn_count": 18 },{ "object": "tts", "id": 123, "created_at": "2026-06-29T10:32:00Z","cost_usd": 0.0123, "voice": "maeve.en-us", "chars": 842, "audio_secs": 58.4 }]}
The feed is lightweight: it does not include presigned
download_urls. Use the typed collection or detail endpoints to get them.
GET /history/transcriptions — Transcription History
List your transcriptions.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
status | string | — | Filter by status (e.g. completed) |
include_download_url | boolean | false | Include a presigned audio URL per item |
(plus all shared parameters)
Request
curl "https://api.fluxions.ai/history/transcriptions?status=completed&limit=5" \-H "Authorization: YOUR_API_KEY"
import requestsr = requests.get('https://api.fluxions.ai/history/transcriptions',headers={'Authorization': 'YOUR_API_KEY'},params={'status': 'completed', 'limit': 5},)print(r.json()['total'], 'transcriptions')
const res = await fetch('https://api.fluxions.ai/history/transcriptions?status=completed&limit=5', {headers: {'Authorization': 'YOUR_API_KEY'}});console.log((await res.json()).total, 'transcriptions');
Response
{"object": "list","page": 1, "limit": 5, "total": 42, "has_more": true,"data": [{"object": "transcription","id": 456,"created_at": "2026-06-29T10:35:00Z","cost_usd": 0.10,"status": "completed","filename": "interview.mp3","audio_duration_secs": 1800.0,"audio_format": "opus","language": "en","num_speakers": 2,"num_segments": 142}]}
GET /history/transcriptions/{id} — One Transcription
Returns the full record with presigned download_url (audio), text_url, and segments_url. 404 if it isn't yours.
curl "https://api.fluxions.ai/history/transcriptions/456" \-H "Authorization: YOUR_API_KEY"
GET /history/tts — TTS Render History
List your text-to-speech renders.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
voice | string | — | Filter by voice id |
include_download_url | boolean | true | Include a presigned Opus URL per item |
(plus all shared parameters)
Request
curl "https://api.fluxions.ai/history/tts?voice=maeve.en-us&limit=10" \-H "Authorization: YOUR_API_KEY"
import requestsr = requests.get('https://api.fluxions.ai/history/tts',headers={'Authorization': 'YOUR_API_KEY'},params={'voice': 'maeve.en-us', 'limit': 10},)for render in r.json()['data']:print(render['id'], render['chars'], render['download_url'])
const res = await fetch('https://api.fluxions.ai/history/tts?voice=maeve.en-us&limit=10', {headers: {'Authorization': 'YOUR_API_KEY'}});const { data } = await res.json();data.forEach(r => console.log(r.id, r.chars, r.download_url));
Response
{"object": "list","page": 1, "limit": 10, "total": 60, "has_more": true,"data": [{"object": "tts","id": 123,"created_at": "2026-06-29T10:32:00Z","cost_usd": 0.0123,"voice": "maeve.en-us","chars": 842,"audio_secs": 58.4,"download_url": "https://...r2.cloudflarestorage.com/...opus"}]}
GET /history/tts/{id} — One Render
Returns one render with a fresh signed download_url. 404 if it isn't yours.
curl "https://api.fluxions.ai/history/tts/123" \-H "Authorization: YOUR_API_KEY"
GET /history/conversations — Conversation History
List your voice conversations (agent calls).
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
voice | string | — | Filter by voice id |
(plus all shared parameters)
Request
curl "https://api.fluxions.ai/history/conversations?limit=10" \-H "Authorization: YOUR_API_KEY"
import requestsr = requests.get('https://api.fluxions.ai/history/conversations',headers={'Authorization': 'YOUR_API_KEY'},params={'limit': 10},)for c in r.json()['data']:print(c['id'], c['turn_count'], c['duration_secs'])
const res = await fetch('https://api.fluxions.ai/history/conversations?limit=10', {headers: {'Authorization': 'YOUR_API_KEY'}});const { data } = await res.json();data.forEach(c => console.log(c.id, c.turn_count, c.duration_secs));
Response
{"object": "list","page": 1, "limit": 10, "total": 23, "has_more": true,"data": [{"object": "conversation","id": "sess_abc123","created_at": "2026-06-29T10:40:00Z","cost_usd": null,"voice": "maeve.en-us","started_at": "2026-06-29T10:40:00Z","ended_at": "2026-06-29T10:45:12Z","duration_secs": 312.4,"turn_count": 18}]}
GET /history/conversations/{id} — One Conversation
Returns the session plus its turn-by-turn transcript. 404 if it isn't yours.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
include_turns | boolean | true | Include the transcript turns |
include_tool_calls | boolean | false | Include tool invocations (calendar, email, …) |
turns_limit | integer | 500 | Max turns to return (max: 2000) |
Request
curl "https://api.fluxions.ai/history/conversations/sess_abc123?include_tool_calls=true" \-H "Authorization: YOUR_API_KEY"
Response
{"object": "conversation","id": "sess_abc123","created_at": "2026-06-29T10:40:00Z","voice": "maeve.en-us","duration_secs": 312.4,"turn_count": 18,"turns": [{ "object": "conversation_turn", "id": 9001, "session_id": "sess_abc123","role": "user", "text": "What's on my calendar today?", "created_at": "2026-06-29T10:40:05Z" },{ "object": "conversation_turn", "id": 9002, "session_id": "sess_abc123","role": "assistant", "text": "You have two meetings...", "created_at": "2026-06-29T10:40:08Z" }],"tool_calls": [{ "object": "tool_call", "id": 51, "tool": "calendar","args": {"range": "today"}, "result": "2 events", "created_at": "2026-06-29T10:40:07Z" }]}
GET /history/conversations/search — Search Turns
Full-text search across your conversation turns (Postgres websearch_to_tsquery).
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
q | string | (required) | Search query |
limit | integer | 20 | Max results (max: 100) |
Request
curl "https://api.fluxions.ai/history/conversations/search?q=dentist+appointment" \-H "Authorization: YOUR_API_KEY"
Response
{"object": "list","query": "dentist appointment","data": [{ "object": "conversation_turn", "id": 9100, "session_id": "sess_def456","role": "user", "text": "remind me about the dentist appointment","created_at": "2026-06-20T14:02:00Z" }]}
GET /history/search — Cross-Domain Search
Search across your whole history in one call. Conversation turns are matched by full text; transcriptions are matched by filename (their text lives in object storage, not the database). Results are type-tagged via object.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
q | string | (required) | Search query |
limit | integer | 20 | Max results per domain (max: 100) |
Request
curl "https://api.fluxions.ai/history/search?q=interview" \-H "Authorization: YOUR_API_KEY"
Response
{"object": "list","query": "interview","data": [{ "object": "conversation_turn", "id": 9200, "session_id": "sess_ghi","role": "assistant", "text": "...the interview went well...", "created_at": "2026-06-25T09:00:00Z" },{ "object": "transcription", "id": 456, "created_at": "2026-06-29T10:35:00Z","status": "completed", "filename": "interview.mp3", "audio_duration_secs": 1800.0 }]}