Production guidelines on Kubernetes
Cluster and capacity requirements
Dapr support for Kubernetes is aligned with Kubernetes Version Skew Policy.
Use the following resource settings as a starting point. Requirements vary depending on cluster size, number of pods, and other factors. Perform individual testing to find the right values for your environment. In production, it’s recommended to not add memory limits to the Dapr control plane components to avoid OOMKilled
pod statuses.
Deployment | CPU | Memory |
---|---|---|
Operator | Limit: 1, Request: 100m | Request: 100Mi |
Sidecar Injector | Limit: 1, Request: 100m | Request: 30Mi |
Sentry | Limit: 1, Request: 100m | Request: 30Mi |
Placement | Limit: 1, Request: 250m | Request: 75Mi |
Note
For more information, refer to the Kubernetes documentation on CPU and Memory resource units and their meaning.Helm
When installing Dapr using Helm, no default limit/request values are set. Each component has a resources
option (for example, dapr_dashboard.resources
), which you can use to tune the Dapr control plane to fit your environment.
The Helm chart readme has detailed information and examples.
For local/dev installations, you might want to skip configuring the resources
options.
Optional components
The following Dapr control plane deployments are optional:
- Placement: For using Dapr Actors
- Sentry: For mTLS for service-to-service invocation
- Dashboard: For an operational view of the cluster
Sidecar resource settings
Set the resource assignments for the Dapr sidecar using the supported annotations. The specific annotations related to resource constraints are:
dapr.io/sidecar-cpu-limit
dapr.io/sidecar-memory-limit
dapr.io/sidecar-cpu-request
dapr.io/sidecar-memory-request
If not set, the Dapr sidecar runs without resource settings, which may lead to issues. For a production-ready setup, it’s strongly recommended to configure these settings.
Example settings for the Dapr sidecar in a production-ready setup:
CPU | Memory |
---|---|
Limit: 300m, Request: 100m | Limit: 1000Mi, Request: 250Mi |
The CPU and memory limits above account for Dapr supporting a high number of I/O bound operations. Use a monitoring tool to get a baseline for the sidecar (and app) containers and tune these settings based on those baselines.
For more details on configuring resource in Kubernetes, see the following Kubernetes guides:
Note
Since Dapr is intended to do much of the I/O heavy lifting for your app, the resources given to Dapr drastically reduce the resource allocations for the application.Setting soft memory limits on Dapr sidecar
Set soft memory limits on the Dapr sidecar when you’ve set up memory limits. With soft memory limits, the sidecar garbage collector frees up memory once it exceeds the limit instead of waiting for it to be double of the last amount of memory present in the heap when it was run. Waiting is the default behavior of the garbage collector used in Go, and can lead to OOM Kill events.
For example, for an app with app-id nodeapp
with memory limit set to 1000Mi, you can use the following in your pod annotations:
annotations:
dapr.io/enabled: "true"
dapr.io/app-id: "nodeapp"
# our daprd memory settings
dapr.io/sidecar-memory-limit: "1000Mi" # your memory limit
dapr.io/env: "GOMEMLIMIT=900MiB" # 90% of your memory limit. Also notice the suffix "MiB" instead of "Mi"
In this example, the soft limit has been set to be 90% to leave 5-10% for other services, as recommended.
The GOMEMLIMIT
environment variable allows certain suffixes for the memory size: B
, KiB
, MiB
, GiB
, and TiB
.
High availability mode
When deploying Dapr in a production-ready configuration, it’s best to deploy with a high availability (HA) configuration of the control plane. This creates three replicas of each control plane pod in the dapr-system
namespace, allowing the Dapr control plane to retain three running instances and survive individual node failures and other outages.
For a new Dapr deployment, HA mode can be set with both:
- The Dapr CLI, and
- Helm charts
For an existing Dapr deployment, you can enable HA mode in a few extra steps.
Individual service HA Helm configuration
You can configure HA mode via Helm across all services by setting the global.ha.enabled
flag to true
. By default, --set global.ha.enabled=true
is fully respected and cannot be overridden, making it impossible to simultaneously have either the placement or scheduler service as a single instance.
Note: HA for scheduler and placement services is not the default setting.
To scale scheduler and placement to three instances independently of the global.ha.enabled
flag, set global.ha.enabled
to false
and dapr_scheduler.ha
and dapr_placement.ha
to true
. For example:
helm upgrade --install dapr dapr/dapr \
--version=1.14 \
--namespace dapr-system \
--create-namespace \
--set global.ha.enabled=false \
--set dapr_scheduler.ha=true \
--set dapr_placement.ha=true \
--wait
Setting cluster critical priority class name for control plane services
In some scenarios, nodes may have memory and/or cpu pressure and the Dapr control plane pods might get selected for eviction. To prevent this, you can set a critical priority class name for the Dapr control plane pods. This ensures that the Dapr control plane pods are not evicted unless all other pods with lower priority are evicted.
Learn more about Protecting Mission-Critical Pods.
There are two built-in critical priority classes in Kubernetes:
system-cluster-critical
system-node-critical
(highest priority)
It’s recommended to set the priorityClassName
to system-cluster-critical
for the Dapr control plane pods.
For a new Dapr control plane deployment, the system-cluster-critical
priority class mode can be set via the helm value global.priorityClassName
.
This priority class can be set with both the Dapr CLI and Helm charts,
using the helm --set global.priorityClassName=system-cluster-critical
argument.
Dapr version < 1.14
For versions of Dapr below v1.14, it’s recommended that you add a ResourceQuota
to the Dapr control plane namespace. This prevents
problems associated with scheduling pods where the cluster may be configured
with limitations on which pods can be assigned high priority classes. For v1.14 onwards the Helm chart adds this automatically.
If you have Dapr installed in namespace dapr-system
, you can create a ResourceQuota
with the following content:
apiVersion: v1
kind: ResourceQuota
metadata:
name: dapr-system-critical-quota
namespace: dapr-system
spec:
scopeSelector:
matchExpressions:
- operator : In
scopeName: PriorityClass
values: [system-cluster-critical]
Deploy Dapr with Helm
Visit the full guide on deploying Dapr with Helm.
Parameters file
It’s recommended to create a values file, instead of specifying parameters on the command. Check the values file into source control so that you can track its changes.
See a full list of available parameters and settings.
The following command runs three replicas of each control plane service in the dapr-system
namespace.
# Add/update a official Dapr Helm repo.
helm repo add dapr https://dapr.github.io/helm-charts/
# or add/update a private Dapr Helm repo.
helm repo add dapr http://helm.custom-domain.com/dapr/dapr/ \
--username=xxx --password=xxx
helm repo update
# See which chart versions are available
helm search repo dapr --devel --versions
# create a values file to store variables
touch values.yml
cat << EOF >> values.yml
global:
ha:
enabled: true
EOF
# run install/upgrade
helm install dapr dapr/dapr \
--version=<Dapr chart version> \
--namespace dapr-system \
--create-namespace \
--values values.yml \
--wait
# verify the installation
kubectl get pods --namespace dapr-system
Note
The example above useshelm install
and helm upgrade
. You can also run helm upgrade --install
to dynamically determine whether to install or upgrade.
The Dapr Helm chart automatically deploys with affinity for nodes with the label kubernetes.io/os=linux
. You can deploy the Dapr control plane to Windows nodes. For more information, see Deploying to a Hybrid Linux/Windows K8s Cluster.
Upgrade Dapr with Helm
Dapr supports zero-downtime upgrades in the following steps.
Upgrade the CLI (recommended)
Upgrading the CLI is optional, but recommended.
- Download the latest version of the CLI.
- Verify the Dapr CLI is in your path.
Upgrade the control plane
Upgrade Dapr on a Kubernetes cluster.
Update the data plane (sidecars)
Update pods that are running Dapr to pick up the new version of the Dapr runtime.
-
Issue a rollout restart command for any deployment that has the
dapr.io/enabled
annotation:kubectl rollout restart deploy/<Application deployment name>
-
View a list of all your Dapr enabled deployments via either:
-
The Dapr Dashboard
-
Running the following command using the Dapr CLI:
dapr list -k APP ID APP PORT AGE CREATED nodeapp 3000 16h 2020-07-29 17:16.22
-
Enable high availability in an existing Dapr deployment
Enabling HA mode for an existing Dapr deployment requires two steps:
-
Delete the existing placement stateful set.
kubectl delete statefulset.apps/dapr-placement-server -n dapr-system
You delete the placement stateful set because, in HA mode, the placement service adds Raft for leader election. However, Kubernetes only allows for limited fields in stateful sets to be patched, subsequently failing upgrade of the placement service.
Deletion of the existing placement stateful set is safe. The agents reconnect and re-register with the newly created placement service, which persist its table in Raft.
-
Issue the upgrade command.
helm upgrade dapr ./charts/dapr -n dapr-system --set global.ha.enabled=true
Recommended security configuration
When properly configured, Dapr ensures secure communication and can make your application more secure with a number of built-in features.
Verify your production-ready deployment includes the following settings:
-
Mutual Authentication (mTLS) is enabled. Dapr has mTLS on by default. Learn more about how to bring your own certificates.
-
App to Dapr API authentication is enabled. This is the communication between your application and the Dapr sidecar. To secure the Dapr API from unauthorized application access, enable Dapr’s token-based authentication.
-
Dapr to App API authentication is enabled. This is the communication between Dapr and your application. Let Dapr know that it is communicating with an authorized application using token authentication.
-
Component secret data is configured in a secret store and not hard-coded in the component YAML file. Learn how to use secrets with Dapr components.
-
The Dapr control plane is installed on a dedicated namespace, such as
dapr-system
. -
Dapr supports and is enabled to scope components for certain applications. This is not a required practice. Learn more about component scopes.
Recommended Placement service configuration
The Placement service is a component in Dapr, responsible for disseminating information about actor addresses to all Dapr sidecars via a placement table (more information on this can be found here).
When running in production, it’s recommended to configure the Placement service with the following values:
- High availability. Ensure the Placement service is highly available (three replicas) and can survive individual node failures. Helm chart value:
dapr_placement.ha=true
- In-memory logs. Use in-memory Raft log store for faster writes. The tradeoff is more placement table disseminations (and thus, network traffic) in an eventual Placement service pod failure. Helm chart value:
dapr_placement.cluster.forceInMemoryLog=true
- No metadata endpoint. Disable the unauthenticated
/placement/state
endpoint which exposes placement table information for the Placement service. Helm chart value:dapr_placement.metadataEnabled=false
- Timeouts Control the sensitivity of network connectivity between the Placement service and the sidecars using the below timeout values. Default values are set, but you can adjust these based on your network conditions.
dapr_placement.keepAliveTime
sets the interval at which the Placement service sends keep alive pings to Dapr sidecars on the gRPC stream to check if the connection is still alive. Lower values will lead to shorter actor rebalancing time in case of pod loss/restart, but higher network traffic during normal operation. Accepts values between1s
and10s
. Default is2s
.dapr_placement.keepAliveTimeout
sets the timeout period for Dapr sidecars to respond to the Placement service’s keep alive pings before the Placement service closes the connection. Lower values will lead to shorter actor rebalancing time in case of pod loss/restart, but higher network traffic during normal operation. Accepts values between1s
and10s
. Default is3s
.dapr_placement.disseminateTimeout
sets the timeout period for dissemination to be delayed after actor membership change (usually related to pod restarts) to avoid excessive dissemination during multiple pod restarts. Higher values will reduce the frequency of dissemination, but delay the table dissemination. Accepts values between1s
and3s
. Default is2s
.
Service account tokens
By default, Kubernetes mounts a volume containing a Service Account token in each container. Applications can use this token, whose permissions vary depending on the configuration of the cluster and namespace, among other things, to perform API calls against the Kubernetes control plane.
When creating a new Pod (or a Deployment, StatefulSet, Job, etc), you can disable auto-mounting the Service Account token by setting automountServiceAccountToken: false
in your pod’s spec.
It’s recommended that you consider deploying your apps with automountServiceAccountToken: false
to improve the security posture of your pods, unless your apps depend on having a Service Account token. For example, you may need a Service Account token if:
- Your application needs to interact with the Kubernetes APIs.
- You are using Dapr components that interact with the Kubernetes APIs; for example, the Kubernetes secret store or the Kubernetes Events binding.
Thus, Dapr does not set automountServiceAccountToken: false
automatically for you. However, in all situations where the Service Account is not required by your solution, it’s recommended that you set this option in the pods spec.
Note
Initializing Dapr components using component secrets stored as Kubernetes secrets does not require a Service Account token, so you can still setautomountServiceAccountToken: false
in this case. Only calling the Kubernetes secret store at runtime, using the Secrets management building block, is impacted.
Tracing and metrics configuration
Tracing and metrics are enabled in Dapr by default. It’s recommended that you set up distributed tracing and metrics for your applications and the Dapr control plane in production.
If you already have your own observability setup, you can disable tracing and metrics for Dapr.
Tracing
Configure a tracing backend for Dapr.
Metrics
For metrics, Dapr exposes a Prometheus endpoint listening on port 9090, which can be scraped by Prometheus.
Set up Prometheus, Grafana, and other monitoring tools with Dapr.
Injector watchdog
The Dapr Operator service includes an injector watchdog, which can be used to detect and remediate situations where your application’s pods may be deployed without the Dapr sidecar (the daprd
container). For example, it can assist with recovering the applications after a total cluster failure.
The injector watchdog is disabled by default when running Dapr in Kubernetes mode. However, you should consider enabling it with the appropriate values for your specific situation.
Refer to the Dapr operator service documentation for more details on the injector watchdog and how to enable it.
Configure seccompProfile
for sidecar containers
By default, the Dapr sidecar injector injects a sidecar without any seccompProfile
. However, for the Dapr sidecar container to run successfully in a namespace with the Restricted profile, the sidecar container needs securityContext.seccompProfile.Type
to not be nil
.
Refer to the Arguments and Annotations overview to set the appropriate seccompProfile
on the sidecar container.
Best Practices
Watch this video for a deep dive into the best practices for running Dapr in production with Kubernetes.