Migrating from Ingress NGINX to Envoy Gateway: Handling cert-manager integration
The recent retirement of Ingress NGINX led a lot of people (including myself) to consider alternatives. Since we need to migrate anyway, it makes sense to migrate to the newer Gateway API, the replacement for the older Ingress API. The Ingress API's limited expressiveness led to a bunch of non-standard annotations, which were clearly suboptimal and definitely not portable.
It turns out that one of the defining features of Gateway API is the attention to RBAC and multitenancy, by splitting responsibilities that in the Ingress API were more blurred. This allows for more sophisticated RBAC setups, which can accommodate a wider range of organizational complexity.
For organizations using cert-manager with Ingress NGINX to automate TLS certificates from Let's Encrypt, this migration has important implications. The Gateway API fundamentally changes how TLS is configured and managed, requiring a rethink of cert-manager integration strategies.
For this migration, I am using Envoy Gateway, an implementation of the Gateway API based on Envoy.
Scenario 1: Using static certificates
Before moving to cert-manager, let's look at using static certificates in a Gateway API setting. The TLS setup in Gateway API is done at the Gateway level, which is the component intended to be managed by cluster operators.
The Gateway object is usually paired to a Load Balancer, either by using a LoadBalancer type service, using a NodePort which is then linked to an external load balancer, or by other implementation-specific mechanisms. All of those mechanisms can incur costs and generate complexity, so there's often a single central Gateway for the entire cluster.
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: test-gateway
namespace: default
spec:
gatewayClassName: test
listeners:
- name: http
port: 80
protocol: HTTP
hostname: "example.com"
allowedRoutes:
namespaces:
from: All # Restrict this
- name: https
port: 443
protocol: HTTPS
hostname: "example.com"
allowedRoutes:
namespaces:
from: All # Restrict this
tls:
mode: Terminate
certificateRefs:
- kind: Secret
name: example-com
On the other hand, services to be exposed are defined at the application level using an HTTPRoute object:
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: app
namespace: app
spec:
parentRefs:
- name: test-gateway
namespace: default
sectionName: https
hostnames:
- example.com
rules:
- matches:
- path:
type: PathPrefix
value: /
backendRefs:
- name: app
port: 80
The main difference that stands out compared to legacy Ingress is that the hostnames are specified twice: at the Gateway level (which also handles TLS) and at the HTTPRoute level, to attach the route to the gateway.
Scenario 2: Using cert-manager
cert-manager supports the Gateway API and can obtain certificates from Let's Encrypt (and other certificate issuers). Since TLS is managed at the Gateway level, cert-manager also integrates at the Gateway level. It is activated using an annotation:
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
...
cert-manager looks at the hostnames in the Gateway to request certificates from Let's Encrypt. The certificate secrets will be populated by cert-manager after the issuing process completes successfully.
This approach works, but it does not allow self-service provisioning of TLS certificates by application teams, something that was possible with Ingress (although with questionable security, as one application could intercept traffic intended for another). This centralized approach does not scale well in mid-sized organizations using a central Gateway, as every new domain requires cluster operator intervention.
One alternative would be for each application team to own its own Gateway, but this introduces cost and complexity from setting up dedicated load balancers, public IPs, and additional infrastructure. Currently, the Gateway API does not support a better model for self-service certificate operations.
In the following sections we will explore a non-standard approach available in Envoy Gateway and a proposed feature that will extend the self-service model to standard Gateway API.
Scenario 3: Self-service certificates using multiple Gateways and Envoy Gateway merged deployments
One way to implement self-service is to use multiple Gateways, one per application. However, this typically multiplies infrastructure resources (like load balancers) and associated complexity and cost. Envoy Gateway offers a non-standard approach to solve this: the merged gateway deployment mode. Using a custom configuration object, it is possible to serve all Gateways belonging to the same class from a single load balancer:
# Custom configuration
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
name: merged-eg
namespace: envoy-gateway-system
spec:
mergeGateways: true
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: test
spec:
controllerName: gateway.envoyproxy.io/gatewayclass-controller
# Reference to custom configuration
parametersRef:
group: gateway.envoyproxy.io
kind: EnvoyProxy
name: merged-eg
namespace: envoy-gateway-system
The main limitation is that hostnames cannot be shared between different application teams. This could be problematic in some scenarios: For example, when one application serves example.com/app1 and another serves example.com/app2.
This approach enables self-service TLS configuration by allowing application teams to manage their own Gateway objects. However, this is a non-standard approach specific to Envoy Gateway.
Scenario 4: Self-service using cert-manager (not yet available)
An ongoing effort is underway to introduce a new feature to the Gateway API called ListenerSet. A ListenerSet will move hostname configuration outside the Gateway object, which in turn will allow delegating TLS certificate management to application teams in a secure way, thus enabling self-service TLS certificates.
cert-manager plans to support ListenerSets, with initial alpha support expected by January 2026. Stay tuned. I'll test it when it becomes available.