Tutorial for serving quantized model with Friendli Engine. Friendli Engine supports FP8, IN8, and AWQ-ed model checkpoints.
friendli-model-optimizer
package
This tool provides model quantization for efficient generative AI serving with Friendli Engine. Install it using the following command:transformers
library.
$OUTPUT_DIR
.
config.json
model.safetensors
special_tokens_map.json
tokenizer_config.json
tokenizer.json
model.safetensors
.model-00001-of-00005.safetensors
model-00002-of-00005.safetensors
model-00003-of-00005.safetensors
model-00004-of-00005.safetensors
model-00005-of-00005.safetensors
FriendliAI/Llama-3.1-8B-Instruct-fp8