API Reference¶
Packages¶
inference.networking.x-k8s.io/v1alpha1¶
Package v1alpha1 contains API Schema definitions for the gateway v1alpha1 API group
Resource Types¶
Criticality¶
Underlying type: string
Defines how important it is to serve the model compared to other models.
Validation: - Enum: [Critical Default Sheddable]
Appears in: - InferenceModelSpec
Field | Description |
---|---|
Critical |
Most important. Requests to this band will be shed last. |
Default |
More important than Sheddable, less important than Critical. Requests in this band will be shed before critical traffic. +kubebuilder:default=Default |
Sheddable |
Least important. Requests to this band will be shed before all other bands. |
InferenceModel¶
InferenceModel is the Schema for the InferenceModels API
Field | Description | Default | Validation |
---|---|---|---|
apiVersion string |
inference.networking.x-k8s.io/v1alpha1 |
||
kind string |
InferenceModel |
||
metadata ObjectMeta |
Refer to Kubernetes API documentation for fields of metadata . |
||
spec InferenceModelSpec |
|||
status InferenceModelStatus |
InferenceModelSpec¶
InferenceModelSpec represents a specific model use case. This resource is managed by the "Inference Workload Owner" persona.
The Inference Workload Owner persona is: a team that trains, verifies, and leverages a large language model from a model frontend, drives the lifecycle and rollout of new versions of those models, and defines the specific performance and latency goals for the model. These workloads are expected to operate within an InferencePool sharing compute capacity with other InferenceModels, defined by the Inference Platform Admin.
InferenceModel's modelName (not the ObjectMeta name) is unique for a given InferencePool, if the name is reused, an error will be shown on the status of a InferenceModel that attempted to reuse. The oldest InferenceModel, based on creation timestamp, will be selected to remain valid. In the event of a race condition, one will be selected at random.
Appears in: - InferenceModel
Field | Description | Default | Validation |
---|---|---|---|
modelName string |
The name of the model as the users set in the "model" parameter in the requests. The name should be unique among the workloads that reference the same backend pool. This is the parameter that will be used to match the request with. In the future, we may allow to match on other request parameters. The other approach to support matching on on other request parameters is to use a different ModelName per HTTPFilter. Names can be reserved without implementing an actual model in the pool. This can be done by specifying a target model and setting the weight to zero, an error will be returned specifying that no valid target model is found. |
MaxLength: 253 |
|
criticality Criticality |
Defines how important it is to serve the model compared to other models referencing the same pool. | Default | Enum: [Critical Default Sheddable] |
targetModels TargetModel array |
Allow multiple versions of a model for traffic splitting. If not specified, the target model name is defaulted to the modelName parameter. modelName is often in reference to a LoRA adapter. |
MaxItems: 10 |
|
poolRef PoolObjectReference |
Reference to the inference pool, the pool must exist in the same namespace. | Required: {} |
InferenceModelStatus¶
InferenceModelStatus defines the observed state of InferenceModel
Appears in: - InferenceModel
Field | Description | Default | Validation |
---|---|---|---|
conditions Condition array |
Conditions track the state of the InferencePool. |
InferencePool¶
InferencePool is the Schema for the Inferencepools API
Field | Description | Default | Validation |
---|---|---|---|
apiVersion string |
inference.networking.x-k8s.io/v1alpha1 |
||
kind string |
InferencePool |
||
metadata ObjectMeta |
Refer to Kubernetes API documentation for fields of metadata . |
||
spec InferencePoolSpec |
|||
status InferencePoolStatus |
InferencePoolSpec¶
InferencePoolSpec defines the desired state of InferencePool
Appears in: - InferencePool
Field | Description | Default | Validation |
---|---|---|---|
selector object (keys:LabelKey, values:LabelValue) |
Selector uses a map of label to watch model server pods that should be included in the InferencePool. ModelServers should not be with any other Service or InferencePool, that behavior is not supported and will result in sub-optimal utilization. In some cases, implementations may translate this to a Service selector, so this matches the simple map used for Service selectors instead of the full Kubernetes LabelSelector type. |
Required: {} |
|
targetPortNumber integer |
TargetPortNumber is the port number that the model servers within the pool expect to receive traffic from. This maps to the TargetPort in: https://pkg.go.dev/k8s.io/api/core/v1#ServicePort |
Maximum: 65535 Minimum: 0 Required: {} |
InferencePoolStatus¶
InferencePoolStatus defines the observed state of InferencePool
Appears in: - InferencePool
Field | Description | Default | Validation |
---|---|---|---|
conditions Condition array |
Conditions track the state of the InferencePool. |
LabelKey¶
Underlying type: string
Originally copied from: https://github.com/kubernetes-sigs/gateway-api/blob/99a3934c6bc1ce0874f3a4c5f20cafd8977ffcb4/apis/v1/shared_types.go#L694-L731 Duplicated as to not take an unexpected dependency on gw's API.
LabelKey is the key of a label. This is used for validation of maps. This matches the Kubernetes "qualified name" validation that is used for labels.
Valid values include:
- example
- example.com
- example.com/path
- example.com/path.html
Invalid values include:
- example~ - "~" is an invalid character
- example.com. - can not start or end with "."
Validation:
- MaxLength: 253
- MinLength: 1
- Pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?([A-Za-z0-9][-A-Za-z0-9_.]{0,61})?[A-Za-z0-9]$
Appears in: - InferencePoolSpec
LabelValue¶
Underlying type: string
LabelValue is the value of a label. This is used for validation of maps. This matches the Kubernetes label validation rules: * must be 63 characters or less (can be empty), * unless empty, must begin and end with an alphanumeric character ([a-z0-9A-Z]), * could contain dashes (-), underscores (_), dots (.), and alphanumerics between.
Valid values include:
- MyValue
- my.name
- 123-my-value
Validation:
- MaxLength: 63
- MinLength: 0
- Pattern: ^(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?$
Appears in: - InferencePoolSpec
PoolObjectReference¶
PoolObjectReference identifies an API object within the namespace of the referrer.
Appears in: - InferenceModelSpec
Field | Description | Default | Validation |
---|---|---|---|
group string |
Group is the group of the referent. | inference.networking.x-k8s.io | MaxLength: 253 Pattern: ^$\|^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$ |
kind string |
Kind is kind of the referent. For example "InferencePool". | InferencePool | MaxLength: 63 MinLength: 1 Pattern: ^[a-zA-Z]([-a-zA-Z0-9]*[a-zA-Z0-9])?$ |
name string |
Name is the name of the referent. | MaxLength: 253 MinLength: 1 Required: {} |
TargetModel¶
TargetModel represents a deployed model or a LoRA adapter. The Name field is expected to match the name of the LoRA adapter (or base model) as it is registered within the model server. Inference Gateway assumes that the model exists on the model server and is the responsibility of the user to validate a correct match. Should a model fail to exist at request time, the error is processed by the Instance Gateway, and then emitted on the appropriate InferenceModel object.
Appears in: - InferenceModelSpec
Field | Description | Default | Validation |
---|---|---|---|
name string |
The name of the adapter as expected by the ModelServer. | MaxLength: 253 |
|
weight integer |
Weight is used to determine the proportion of traffic that should be sent to this target model when multiple versions of the model are specified. |
1 | Maximum: 1e+06 Minimum: 0 |