We had an indexer cluster, ladies and gentlemen!

May 29, 2020

We had an indexer cluster, ladies and gentlemen!

This entry will be Splunk related. Heavily Splunk related. But I'll open with how I was asked to present my Terraform code for setting up a practice lab in AWS at our company's weekly tech talk. I was pretty proud of the slide deck I put together. When my turn came, I shared my desktop and started running through it. Halfway through, someone piped up, “Are we supposed to be seeing something? It sounds like you're expecting us to see something.” Turns out, they could see my gallery view of the meeting, 'cause my slide deck was in another workspace. And then, when I tried to share Chrome specifically instead — where I was presenting from — our meeting application complained that it needed more permissions (that wouldn't be issued until I left the meeting as part of a restart). I got through it, in the end, but I felt a bit silly.

I ran that code today, thinking I'd get much further along in my practising. And, initially, things looked good. But once I'd finished the initial indexer cluster configuration, I wasn't clear on what the contents should be of that initial cluster bundle to be sent to the peers. The Clustering Administration class had quickly gone from everything's working to let's bust an indexer and see how failover works. I did eventually find a relevant page, deep in Splunk Docs, only to find out that, while the master node was happy to validate said bundle, it was not about to push anything out to the peers.

Early on I'd noticed that the peers were in automatic detention, but, in my mind, that went in the category of, oh, they don't have any data to index yet. My colleague had spent a long time troubleshooting a problem that essentially boiled down to, oh, you won't see that until you start forwarding data to the cluster. I decided all sorts of anomalies were acceptable while my cluster was in a simple racing block position, as it were.

Silly me. Silly, silly me. Thankfully, one of our longest serving splunkers took pity on my whining in the team chat at 10pm on her Friday night. Turns out, Splunk likes to have plenty of breathing room by default. Five gigabytes of it, in fact. Without that, things... stop. EBS volumes default to 8GB. The typical Splunk Enterprise install comes in at about 3.5GB. Notice a problem?

Once I'd gone 'round my peers, knocking that default well on the head — minFreeSpace under diskUsage in server.conf, for your notes — making sure that while I did it in each peer's slave-apps, I also did it in the master's master-apps bundle directory that would soon be clobbering the former changes (I hoped), everything was tickety-boo.

And, once my runbook was up-to-date, I typed terraform destroy again. D-:

End of Day 20

—
jlj #100DaysToOffload

I'm writing this as part of the 100 Days To Offload project; join us at: https://100daystooffload.com/