SPLK-1002 Splunk Core Certified Power User – Splunk Indexer And Search Head Clustering

  1. Introduction to Clustering and Indexer Clustering UseCase

Hi. In this module we’ll be discussing one of the most important configuration of splunk, that is clustering. The term clustering refers to set of nodes performing identical activities in order to achieve high availability, data integrity or performance boost. In splunk we’ll be going what all different is we can cluster splunk instance and what are all its benefits in splunk? Mainly we have indexer cluster where data is replicated between multiple indexers, further breakdown of indexer cluster. We have single set cluster multi site cluster which we’ll be discussing in our next video.

The indexer cluster is mainly used for data availability so that at any point of time one of your indexer goes down, your splunk environment will not be impacted or any search results will not be returned with insufficient results. The next one is searched clustering where the configuration of reports dashboard alerts, including your saved searches or configure to replicate to other searched instances. In any case, let’s say one of your searches goes down, the other searcher can still run your reports alert dashboards and continue the operation without any impact.

And the last one is av forwarder clustering. Let’s say in some of the major organization where avoiders are in place in the form of cluster, they might have clustered based on splunk instance or at the OS level so that this reception of the logs are not impacted when any of the av four order goes down. Before we proceed on understanding deep about clustering index or cluster searches cluster and avoided clustering, we also need to understand what are the disadvantages of not having a cluster. For that, let me bring out a diagram which I have created so that you guys will be able to understand clear from our previous modules.

We know that this is how a typical splunk architecture looks like. I would strongly recommend before this discussion have a glance through our module two where we have discussed different type of architecture and how to build the architecture using splunk. Visual have used the same tools and same diagram which we discussed in the module two, this is just in order to understand why we need a clustering in splunk. So to begin with we have a forwarder or the data sources of splunk which are sending data to our indexer and searcher is searching the stored data on our indexes. This is a non clustered environment.

So what happens is every time your data sources send data to indexer, your universal forwarder switches, for instance your universal forwarder and also your aviv forwarder. If it has a group of three indexes, it switches to each indexes every 10 seconds to 30 seconds depending upon your configuration. By default it is every 30 seconds it switches the sending of logs from one indexer to another indexes. That is the first 30 seconds the forwarder sends the log to this indexer. After 30 seconds it stopped this and sends the lock to this index. Let’s say these indexes are not clustered.

So at the end of day, out of 100% of the logs that are received, 33% of the data is here. 33 years, 33 years, et cetera. So the 100% of the data is distributed among you, three indexes in a non clustered environment. In that case, let’s say all are up. I’m able to search the data and I’m able to get 100% of the result. Let us consider our scenario where the non clustered environment faces difficulties. That is, one of the indexer crashes down.

So what happens? So we have 33. 33% data here and 33. 33% data here. So whenever a user searches, this indexer is unavailable. So the data on this indexer is not being searched. So whatever the searches you are running is being run on these two indexes. And the results you are getting is not 100%. You’re losing one third of the information that you are supposed to get. So this is one of the major drawback of non clustered environment.

Like when one indexer crashes, you don’t have the access to the data that is stored on that indexes, where this is one of the primary reasons to go for your indexing clustering, where you can have the data shared among different indexes. So even though one index goes down, the other two should be able to return you 100% of the data. So we will see which all the different methods we’ll be using in order to achieve this scenario.

  1. Search Head Clustering Use Case

Now let us consider what are the disadvantages of not having a search at clustering in our environment. So here we have two searches which are running their searches on three number of indexes. Let us consider a scenario this search at fails. So this communication has been broken. So what happens? Any scheduled searches that are as part of this searcher alerts report dashboard will not be accessible.

So that will be a huge impact in case lot of users are dependent on the searcher. So unless until the search it comes back online, you will not be able to search anything you using this search ed, including your report dashboard. Scheduled reports, scheduled alerts, nothing will be triggered until the search ed is back online. In case if you have a search ed cluster, these reports and alerts will be shared among these clusters.

  1. Single Site indexer Clustering

We have understood by our previous discussions the advantages or the benefits of having clustering in our environment. So when we talk about clustering, there are two main clustering in terms of indexer. That is, single site indexer cluster and multi site indexer cluster. Let’s go through the first one single site indexer cluster. In order to understand understand this better, let us go back to our architecture diagram where we discussed the failover scenario in this architecture. Now we can see these indexes communicate in between each other.

Considering the previous scenario, we have 33% of data here, 33 year and 33 year. The information is shared among three indexer, whereas if the clustering is enabled, these information will be shared among each other. So indexer one receives one packet of data. This packet is replicated to here or here, depending on the configuration of indexer. For example, purpose let us consider I will copy this indexer into both of them. So we’ll go through a scenario and understand what happens when we copy this indexer data into the other two.

One, the forwarder sends one packet. This packet has been indexed here. Once it is indexed, it is copied into my index number two, and then it is copied to index and number three. In any case, if my indexer goes down, the data is still present here and here, so that I need not worry even if my one of the indexer goes down. So this is only as part of my single site clustering.

Similarly, this holds good for any other indexer. Let’s say this index receives a packet, it indexes local and it then copies to this one and this one. So this index goes down. These two can still give you 100% of your results and there will not be any illumination of results or loss of data during searching. That is our single side clustering.

  1. Multisite Indexer Clustering

In order to understand multisite indexer clustering, we’ll be going through one of the architecture that we discussed in our earlier modules. So that is this architecture which has multisite indexer clustering with high availability. This is the architecture we’ll be deploying on our Amazon aws as part of this tutorial. In this architecture, you can see there is a site one, site two, which makes it a multisite splunk deployment and indexer site one cluster and site two cluster.

In our previous single site cluster, we understood that any data that has been received on index number one will be copied to index and number two. That is called replication of your data. Similarly, in this environment, what happens is the data received to the indexer is being copied in between indexes and also across the site where the other group of indexes are present. At any point, even when all three indexes goes down, the data will be available in my second site.

So this is one of the advantages of having multi site indexer clustering. Usually the configuration or the process for having indexer cluster is have two copies of data here and one copy of data here. We’ll be controlling these configuration, how many number of copies should be in, how many number of indexers, and which site should hold the most number of copies. Using search and replication factor in our next videos, let us run to a scenario where a v four order sends the data to index number one.

The indexer number one copies one set of copy within itself and copies the other set to index the number two or indexer number three based on its availability. So we have two copies of index data here. Once these two copies are completed, another set of copy is copied across this indexer so that at any point of time two of these index indexes goes down which are holding similar data, the data should be retrieved from these indexes.

  1. Search Head Clustering

In the same scenario holds good for our search ed also where the data replication part in the searcher is changed to Reports, dashboards and Alert, these are the configuration that will be replicated across your searches’this. dashboard, Reports and Alerts can also include your custom built applications or some of the premium maps. All this configuration will be replicated to your site two. At any case your searcher fails immediately, you can activate your site two where it can function without any impact. The same schedule, searches, Reports and Alerts can be triggered from the same site two instance where the clustering has been enabled.

img