Fine-tune Vision Language Models (VLMs) on Friendli Dedicated Endpoints using datasets.
.jsonl
or .parquet
format, where each line represents a sequence of messages. Each message in the conversation should include a "role"
(e.g., system
, user
, or assistant
) and "content"
. For VLM fine-tuning, user content can contain both text and image data (Note that for image data, we support URL and Base64).
Here’s an example of what it should look like. Note that it’s one line but beautified for readability:
Sample
object containing a conversation with messages.
TEXT
and IMAGE
).
https://friendli.ai/<teamId>/<projectId>/...
.
accomplished-shark
).