ITNEXT

ITNEXT is a platform for IT developers & software engineers to share knowledge, connect, collaborate, learn and experience next-gen technologies.

Follow publication

Apache Druid: PostgreSQL as Metadata storage, and replace ZooKeeper with Kubernetes Extensions

--

We continue with a series of posts about Apache Druid. In the first part, we took a look at the Apache Druid itself — its architecture and monitoring, in the second part — we ran a PostgreSQL cluster and set up its monitoring.

Next tasks:

Let’s get started with PostgreSQL.

Configuring Apache Druid with PostgreSQL

See PostgreSQL Metadata Store and Metadata Storage.

PostgreSQL users

Let’s go back to the manifests/minimal-master-replica-svcmonitor.yaml file, which was used to create the PostgreSQL cluster - add a druid user, and a database druid:

...
users:
zalando: # database owner
- superuser
- createdb
foo_user: [] # role for application foo
druid:
- createdb
databases:
foo: zalando # dbname: owner
druid: druid
...

Upgrade the cluster:

$ kubectl apply -f maniapplyminimal-master-replica-svcmonitor.yaml

Retrieve the password of the user druid :

$ kubectl -n test-pg get secret druid.acid-minimal-cluster.credentials.postgresql.acid.zalan.do -o ‘jsonpath={.data.password}’ | base64 -d
Zfqeb0oJnW3fcBCZvEz1zyAn3TMijIvdv5D8WYOz0Y168ym6fXahta05zJjnd3tY

Open port to the PostgreSQL master:

$ kubectl -n test-pg port-forward acid-minimal-cluster-0 6432:5432
Forwarding from 127.0.0.1:6432 -> 5432
Forwarding from [::1]:6432 -> 5432

Connect:

$ psql -U druid -h localhost -p 6432
Password for user druid:
psql (14.5, server 13.7 (Ubuntu 13.7–1.pgdg18.04+1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type “help” for help.
druid=>

Check the base — must be empty for now:

druid-> \dt
Did not find any relations.

Apache Druid metadata.storage config

Use the same druid-operator/examples/tiny-cluster.yaml config, that was used to create the Apache Druid cluster (see Spin Up Druid Cluster).

Currently, it’s configured for DerbyDB, so that the data is stored on the local disk:

...
druid.metadata.storage.type=derby
druid.metadata.storage.connector.connectURI=jdbc:derby://localhost:1527/druid/data/derbydb/metadata.db;create=true
druid.metadata.storage.connector.host=localhost
druid.metadata.storage.connector.port=1527
druid.metadata.storage.connector.createTables=true
...

For PostgreSQL, we need to specify connectURI, so find corresponding Kubernetes Service:

$ kubectl -n test-pg get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
acid-minimal-cluster ClusterIP 10.97.188.225 <none> 5432/TCP 14h

Edit the manifest — comment the Derbi lines, and add a new config with the type=postgresql:

...
# Extensions
#
druid.extensions.loadList=["druid-kafka-indexing-service", "postgresql-metadata-storage", "druid-kubernetes-extensions"]
...
# Metadata Store
#druid.metadata.storage.type=derby
#druid.metadata.storage.connector.connectURI=jdbc:derby://localhost:1527/druid/data/derbydb/metadata.db;create=true
#druid.metadata.storage.connector.host=localhost
#druid.metadata.storage.connector.port=1527
#druid.metadata.storage.connector.createTables=true

druid.metadata.storage.type=postgresql
druid.metadata.storage.connector.connectURI=jdbc:postgresql://acid-minimal-cluster.test-pg.svc.cluster.local/druid
druid.metadata.storage.connector.user=druid
druid.metadata.storage.connector.password=Zfqeb0oJnW3fcBCZvEz1zyAn3TMijIvdv5D8WYOz0Y168ym6fXahta05zJjnd3tY
druid.metadata.storage.connector.createTables=true
...

Update the Druid cluster:

$ kubectl -n druid apply -f examples/tiny-cluster.yaml

Check the data in the Postgre database:

druid-> \dt
List of relations
Schema | Name | Type | Owner
— — — — + — — — — — — — — — — — -+ — — — -+ — — — -
public | druid_audit | table | druid
public | druid_config | table | druid
public | druid_datasource | table | druid
public | druid_pendingsegments | table | druid
public | druid_rules | table | druid
public | druid_segments | table | druid
public | druid_supervisors | table | druid

Nice!

If you need to migrate data from Derby to Postgre — see Metadata Migration.

Next, let’s get rid of the necessary in the ZooKeeper instance to replace it with Kubernetes itself.

Configuring Druid Kubernetes Service Discovery

Documentation for the module is here>>>.

Go back to the druid-operator/examples/tiny-cluster.yaml, update the config - turn on ZooKeeper, add a new extension druid-kubernetes-extensionsand additional parameters:

...
druid.extensions.loadList=["druid-kafka-indexing-service", "postgresql-metadata-storage", "druid-kubernetes-extensions"]
...
druid.zk.service.enabled=false
druid.serverview.type=http
druid.coordinator.loadqueuepeon.type=http
druid.indexer.runner.type=httpRemote
druid.discovery.type=k8s

# Zookeeper
#druid.zk.service.host=tiny-cluster-zk-0.tiny-cluster-zk
#druid.zk.paths.base=/druid
#druid.zk.service.compress=false
...

Upgrade the cluster:

$ kubectl -n druid apply -f examples/tiny-cluster.yam

Druid RBAC Role

Add an RBAC Role and a RoleBinding, otherwise, we will get authorization errors like this:

ERROR [org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcherbroker] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher — Error while watching node type [BROKER]
org.apache.druid.java.util.common.RE: Expection in watching pods, code[403] and error[{“kind”:”Status”,”apiVersion”:”v1",”metadata”:{},”status”:”Failure”,”message”:”pods is forbidden: User \”system:serviceaccount:druid:default\” cannot watch resource
\”pods\” in API group \”\” in the namespace \”druid\””,”reason”:”Forbidden”,”details”:{“kind”:”pods”},”code”:403}

Create a manifest from the documentation:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: druid-cluster
rules:
- apiGroups:
- ""
resources:
- pods
- configmaps
verbs:
- '*'
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: druid-cluster
subjects:
- kind: ServiceAccount
name: default
roleRef:
kind: Role
name: druid-cluster
apiGroup: rbac.authorization.k8s.io

Create new resources in Druid’s namespace:

$ kubectl -n druid apply -f druid-serviceaccout.yaml
role.rbac.authorization.k8s.io/druid-cluster created
rolebinding.rbac.authorization.k8s.io/druid-cluster created

And in a minute or two check the logs again:


2022–09–21T17:01:15,916 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher — Starting NodeRoleWatcher for [HISTORICAL]…
2022–09–21T17:01:15,916 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher — Started NodeRoleWatcher for [HISTORICAL].
2022–09–21T17:01:15,916 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider — Created NodeRoleWatcher for nodeRole [HISTORICAL].
2022–09–21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider — Creating NodeRoleWatcher for nodeRole [PEON].
2022–09–21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher — Starting NodeRoleWatcher for [PEON]…
2022–09–21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher — Started NodeRoleWatcher for [PEON].
2022–09–21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider — Created NodeRoleWatcher for nodeRole [PEON].
2022–09–21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider — Creating NodeRoleWatcher for nodeRole [INDEXER].
2022–09–21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher — Starting NodeRoleWatcher for [INDEXER]…
2022–09–21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher — Started NodeRoleWatcher for [INDEXER].
2022–09–21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider — Created NodeRoleWatcher for nodeRole [INDEXER].
2022–09–21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider — Creating NodeRoleWatcher for nodeRole [BROKER].
2022–09–21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher — Starting NodeRoleWatcher for [BROKER]…
2022–09–21T17:01:15,918 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher — Started NodeRoleWatcher for [BROKER].
2022–09–21T17:01:15,918 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider — Created NodeRoleWatcher for nodeRole [BROKER].

Done.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Published in ITNEXT

ITNEXT is a platform for IT developers & software engineers to share knowledge, connect, collaborate, learn and experience next-gen technologies.

No responses yet

Write a response