Monitoring
Monitoring your OpenShift clusters is critical for the environment health, the quality of services. It helps ensure that all deployed workloads are running smoothly and that the environment is properly scoped.
OpenShift Monitoring Service (Promethus/Grafana)
OpenShift Container Platform includes a pre-installed monitoring stack that is based on the Prometheus/Grafana. MAS also provides app-level promethus metrics and a set of Grafana dashboards for application health. More installation, configuration details can be found in IBM MAS Monitoring
Best practice for OpenShift Monitoring Service
- enable User Workload:
enableUserWorkload: false
- consider to increase the promethus retention policy whose default value is 24h and add persistent volumes
- consider to change Alert Manager's storage class and size
Below is the sample for configmap cluster-monitoring-config
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
enableUserWorkload: true
prometheusK8s:
retention: 90d
volumeClaimTemplate:
spec:
storageClassName: nfs-client
resources:
requests:
cpu: 200m
storage: 300Gi
memory: 2Gi
limits:
cpu: 2
memory: 4Gi
alertmanagerMain:
volumeClaimTemplate:
spec:
storageClassName: nfs-client
resources:
requests:
storage: 20Gi
Note
- Except OpenShift Monitoring Service (Promethus/Grafana), there are other paid solutions like IBM Instana, New Relic, Data Dog that also support OCP.
- If the cluster is cloud based, consider to use cloud provider's monitoring tool for additional info like network, disk, managed services. e.g. AWS CloudWatch, IBM Log Analysis...