Observability is an integral part of DevOps. To support this, Friendli Container exports internal metrics in a Prometheus text format.
http://localhost:8281/metrics
. You can configure the port number using the command line option --metrics-port
.
Metric Name | Description |
---|---|
friendli_requests_total | Cumulative number of requests received |
friendli_responses_total | Cumulative number of responses sent |
friendli_items_total | Cumulative number of items requested |
friendli_failure_by_cancel | Cumulative number of failed requests due to cancellation |
friendli_failure_by_timeout | Cumulative number of failed requests due to timeout |
friendli_failure_by_nan_error | Cumulative number of failed requests due to NaN error |
friendli_failure_by_reject | Cumulative number of failed requests due to rejection |
n
field in the request body.
Upon receiving such request, friendli_requests_total
is increased by 1 and friendli_items_total
is increased by n
.Metric Name | Description |
---|---|
friendli_current_requests | Current number of requests in the engine (either assigned or waiting) |
friendli_current_items | Current number of items in the engine (either assigned or waiting) |
friendli_current_assigned_items | Current number of items actively processed by the engine |
friendli_current_waiting_items | Current number number of items waiting in the internal queue |
Histogram | Metric Name | Description |
---|---|---|
Friendli TCache hit ratio (0≤value≤1) | friendli_tcache_hit_ratio_bucket | Bucketized number of histogram samples for TCache hit ratio, with le label |
friendli_tcache_hit_ratio_count | Total number of histogram samples for TCache hit ratio | |
friendli_tcache_hit_ratio_sum | Sum of histogram sample values for TCache hit ratio | |
The length of input tokens (Experimental metric) | friendli_input_lengths_bucket | Bucketized number of histogram samples for length of input tokens, with le label |
friendli_input_lengths_count | Total number of histogram samples for length of input tokens | |
friendli_input_lengths_sum | Sum of histogram sample values for length of input tokens | |
The length of output tokens (Experimental metric) | friendli_output_lengths_bucket | Bucketized number of histogram samples for length of output tokens, with le label |
friendli_output_lengths_count | Total number of histogram samples for length of output tokens | |
friendli_output_lengths_sum | Sum of histogram sample values for length of output tokens |
Quantiles | Metric Name | Description |
---|---|---|
Request completion latency (in nanoseconds) | friendli_requests_latencies | Percentile value for request completion latency (quantile label is either 0.5 , 0.9 , or 0.99 ) |
friendli_requests_latencies_count | Total number of samples for request completion latency | |
friendli_requests_latencies_sum | Sum of sample values for request completion latency | |
Time to first token (TTFT) (in nanoseconds) | friendli_requests_ttft | Percentile value for time to first token (TTFT) (quantile label is either 0.5 , 0.9 , or 0.99 ) |
friendli_requests_ttft_count | Total number of samples for time to first token (TTFT) | |
friendli_requests_ttft_sum | Sum of sample values for time to first token (TTFT) | |
Request queueing delay (in nanoseconds) | friendli_requests_queueing_delays | Percentile value for queueing delay (quantile label is either 0.5 , 0.9 , or 0.99 ) |
friendli_requests_queueing_delays_count | Total number of samples for queueing delay | |
friendli_requests_queueing_delays_sum | Sum of sample values for queueing delay |
Metric Name | Label | Description |
---|---|---|
friendli_engine_version | version | Engine version |