Audio and Speech: Converting Audio to Text

Friendli provides audio and speech features through Friendli Dedicated Endpoints, allowing you to convert audio files to text and perform various AI tasks. This guide explains how to use these features with examples for both the Playground and API interfaces. You can find the full list of available models here.

ASR - `/v1/audio/transcriptions`

Our ASR (Automatic Speech Recognition) service is designed for efficient audio transcription.
By default, audio input is limited to 30 seconds. If you require support for longer audio inputs, please contact us.

API Usage Example

from openai import OpenAI
import os

client = OpenAI(
  base_url="https://api.friendli.ai/dedicated/v1",
  api_key=os.getenv("FRIENDLI_TOKEN"),
)
audio_file= open("/path/to/file/audio.mp3", "rb")

transcription = client.audio.transcriptions.create(
    model="YOUR_ENDPOINT_ID",
    file=audio_file
)

print(transcription.text)

For more detailed information, please refer to the API reference.

Supported Models

We support a variety of powerful ASR models, including:

Audio Modality - `/v1/chat/completions`

The audio modality endpoint allows you to combine audio and text inputs, enabling advanced AI tasks. This endpoint is ideal for:

Complex audio and text analysis
Conversational AI
Tasks requiring diverse inference, such as summarization, sentiment analysis, and question answering.

By default, audio input is limited to 10 seconds. If you require support for longer audio inputs, please contact us.

Passing a URL

curl -X POST https://api.friendli.ai/dedicated/v1/chat/completions \
     -H "Authorization: Bearer $FRIENDLI_TOKEN" \
     -H "Content-Type: application/json" \
     --data @- <<EOF
{
  "model": "your endpoint id",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What's in this audio?"
        },
        {
          "type": "audio_url",
          "audio_url": {
            "url": "https://example.com/path/to/audio.mp3"
          }
        }
      ]
    }
  ]
}
EOF

Supported Models

We offer a range of multi-modal models, including:

Supported Audio Formats

Our platform supports a wide range of audio formats compatible with the librosa library, ensuring broad compatibility for your applications. Supported formats include:

MP3 (.mp3)
WAV (.wav)
FLAC (.flac)
OGG (.ogg)
And many other standard audio formats

Get Started

Core Concepts

Administration

Products

Audio and Speech: Converting Audio to Text

ASR - `/v1/audio/transcriptions`

API Usage Example

Supported Models

Audio Modality - `/v1/chat/completions`

Supported Models

Supported Audio Formats

Get Started

Core Concepts

Administration

Products

​ASR - /v1/audio/transcriptions

​API Usage Example

​Supported Models

​Audio Modality - /v1/chat/completions

​Supported Models

​Supported Audio Formats

ASR - `/v1/audio/transcriptions`

API Usage Example

Supported Models

Audio Modality - `/v1/chat/completions`

Supported Models

Supported Audio Formats