Apache Druid: PostgreSQL as Metadata storage, and replace ZooKeeper with Kubernetes Extensions

We continue with a series of posts about Apache Druid. In the first part, we took a look at the Apache Druid itself — its architecture and monitoring, in the second part — we ran a PostgreSQL cluster and set up its monitoring.
Next tasks:
- switch Druid to PostgreSQL as metadata storage instead of Apache Derby
- get rid of the Apache ZooKeeper, and replace it with the
druid-kubernetes-extensions
Let’s get started with PostgreSQL.
Configuring Apache Druid with PostgreSQL
See PostgreSQL Metadata Store and Metadata Storage.
PostgreSQL users
Let’s go back to the manifests/minimal-master-replica-svcmonitor.yaml
file, which was used to create the PostgreSQL cluster - add a druid
user, and a database druid
:
...
users:
zalando: # database owner
- superuser
- createdb
foo_user: [] # role for application foo
druid:
- createdb
databases:
foo: zalando # dbname: owner
druid: druid
...
Upgrade the cluster:
$ kubectl apply -f maniapplyminimal-master-replica-svcmonitor.yaml
Retrieve the password of the user druid :
$ kubectl -n test-pg get secret druid.acid-minimal-cluster.credentials.postgresql.acid.zalan.do -o ‘jsonpath={.data.password}’ | base64 -d
Zfqeb0oJnW3fcBCZvEz1zyAn3TMijIvdv5D8WYOz0Y168ym6fXahta05zJjnd3tY
Open port to the PostgreSQL master:
$ kubectl -n test-pg port-forward acid-minimal-cluster-0 6432:5432
Forwarding from 127.0.0.1:6432 -> 5432
Forwarding from [::1]:6432 -> 5432
Connect:
$ psql -U druid -h localhost -p 6432
Password for user druid:
psql (14.5, server 13.7 (Ubuntu 13.7–1.pgdg18.04+1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type “help” for help.
druid=>
Check the base — must be empty for now:
druid-> \dt
Did not find any relations.
Apache Druid metadata.storage
config
Use the same druid-operator/examples/tiny-cluster.yaml
config, that was used to create the Apache Druid cluster (see Spin Up Druid Cluster).
Currently, it’s configured for DerbyDB, so that the data is stored on the local disk:
...
druid.metadata.storage.type=derby
druid.metadata.storage.connector.connectURI=jdbc:derby://localhost:1527/druid/data/derbydb/metadata.db;create=true
druid.metadata.storage.connector.host=localhost
druid.metadata.storage.connector.port=1527
druid.metadata.storage.connector.createTables=true
...
For PostgreSQL, we need to specify connectURI
, so find corresponding Kubernetes Service:
$ kubectl -n test-pg get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
acid-minimal-cluster ClusterIP 10.97.188.225 <none> 5432/TCP 14h
Edit the manifest — comment the Derbi lines, and add a new config with the type=postgresql
:
...
# Extensions
#
druid.extensions.loadList=["druid-kafka-indexing-service", "postgresql-metadata-storage", "druid-kubernetes-extensions"]
...
# Metadata Store
#druid.metadata.storage.type=derby
#druid.metadata.storage.connector.connectURI=jdbc:derby://localhost:1527/druid/data/derbydb/metadata.db;create=true
#druid.metadata.storage.connector.host=localhost
#druid.metadata.storage.connector.port=1527
#druid.metadata.storage.connector.createTables=true
druid.metadata.storage.type=postgresql
druid.metadata.storage.connector.connectURI=jdbc:postgresql://acid-minimal-cluster.test-pg.svc.cluster.local/druid
druid.metadata.storage.connector.user=druid
druid.metadata.storage.connector.password=Zfqeb0oJnW3fcBCZvEz1zyAn3TMijIvdv5D8WYOz0Y168ym6fXahta05zJjnd3tY
druid.metadata.storage.connector.createTables=true
...
Update the Druid cluster:
$ kubectl -n druid apply -f examples/tiny-cluster.yaml
Check the data in the Postgre database:
druid-> \dt
List of relations
Schema | Name | Type | Owner
— — — — + — — — — — — — — — — — -+ — — — -+ — — — -
public | druid_audit | table | druid
public | druid_config | table | druid
public | druid_datasource | table | druid
public | druid_pendingsegments | table | druid
public | druid_rules | table | druid
public | druid_segments | table | druid
public | druid_supervisors | table | druid
Nice!
If you need to migrate data from Derby to Postgre — see Metadata Migration.
Next, let’s get rid of the necessary in the ZooKeeper instance to replace it with Kubernetes itself.
Configuring Druid Kubernetes Service Discovery
Documentation for the module is here>>>.
Go back to the druid-operator/examples/tiny-cluster.yaml
, update the config - turn on ZooKeeper, add a new extension druid-kubernetes-extensions
and additional parameters:
...
druid.extensions.loadList=["druid-kafka-indexing-service", "postgresql-metadata-storage", "druid-kubernetes-extensions"]
...
druid.zk.service.enabled=false
druid.serverview.type=http
druid.coordinator.loadqueuepeon.type=http
druid.indexer.runner.type=httpRemote
druid.discovery.type=k8s
# Zookeeper
#druid.zk.service.host=tiny-cluster-zk-0.tiny-cluster-zk
#druid.zk.paths.base=/druid
#druid.zk.service.compress=false
...
Upgrade the cluster:
$ kubectl -n druid apply -f examples/tiny-cluster.yam
Druid RBAC Role
Add an RBAC Role and a RoleBinding, otherwise, we will get authorization errors like this:
ERROR [org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcherbroker] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher — Error while watching node type [BROKER]
org.apache.druid.java.util.common.RE: Expection in watching pods, code[403] and error[{“kind”:”Status”,”apiVersion”:”v1",”metadata”:{},”status”:”Failure”,”message”:”pods is forbidden: User \”system:serviceaccount:druid:default\” cannot watch resource
\”pods\” in API group \”\” in the namespace \”druid\””,”reason”:”Forbidden”,”details”:{“kind”:”pods”},”code”:403}
Create a manifest from the documentation:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: druid-cluster
rules:
- apiGroups:
- ""
resources:
- pods
- configmaps
verbs:
- '*'
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: druid-cluster
subjects:
- kind: ServiceAccount
name: default
roleRef:
kind: Role
name: druid-cluster
apiGroup: rbac.authorization.k8s.io
Create new resources in Druid’s namespace:
$ kubectl -n druid apply -f druid-serviceaccout.yaml
role.rbac.authorization.k8s.io/druid-cluster created
rolebinding.rbac.authorization.k8s.io/druid-cluster created
And in a minute or two check the logs again:
…
2022–09–21T17:01:15,916 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher — Starting NodeRoleWatcher for [HISTORICAL]…
2022–09–21T17:01:15,916 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher — Started NodeRoleWatcher for [HISTORICAL].
2022–09–21T17:01:15,916 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider — Created NodeRoleWatcher for nodeRole [HISTORICAL].
2022–09–21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider — Creating NodeRoleWatcher for nodeRole [PEON].
2022–09–21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher — Starting NodeRoleWatcher for [PEON]…
2022–09–21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher — Started NodeRoleWatcher for [PEON].
2022–09–21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider — Created NodeRoleWatcher for nodeRole [PEON].
2022–09–21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider — Creating NodeRoleWatcher for nodeRole [INDEXER].
2022–09–21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher — Starting NodeRoleWatcher for [INDEXER]…
2022–09–21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher — Started NodeRoleWatcher for [INDEXER].
2022–09–21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider — Created NodeRoleWatcher for nodeRole [INDEXER].
2022–09–21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider — Creating NodeRoleWatcher for nodeRole [BROKER].
2022–09–21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher — Starting NodeRoleWatcher for [BROKER]…
2022–09–21T17:01:15,918 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher — Started NodeRoleWatcher for [BROKER].
2022–09–21T17:01:15,918 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider — Created NodeRoleWatcher for nodeRole [BROKER].
…
Done.
Originally published at RTFM: Linux, DevOps, and system administration.