SPLK-2002 Splunk Enterprise Certified Architect – Distributed Splunk Architecture Part 3

  1. Masking Sensitive Data at Index Time

Hey everyone and welcome back. In today’s video we will be looking into the masking of sensitive data before it gets indexed. Now, typically it might happen that whatever log file that you are ingesting, it would contain or it might contain certain sensitive information like credit card information or Social Security numbers or various other sensitive information.

Now, in such cases you might want to mask such information that in such a way that any analyst who might be monitoring your logs, they would not get such information like credit card or Social Security numbers. So one of the example is that let’s say this is the sample log file and on the right hand side you have a credit card information. So let’s assume this is a credit card information. So what might you want is you might want to transform this information in such a way that when this gets indexed to Splunk, it would come something similar to what you see in the second example.

So this is ideally important during the index sync time itself. So before the data gets indexed, it should be transformed such that whenever someone analyzes, they will not be able to find out the original information. So we already discussed during the index or component stage that there were two stages. One is the parsing stage and second is the indexing stage. So before indexing stage, we want to parse the information in such a way that any data which is containing such information will be transformed to XSX before it goes to the indexing queue. And this is something that we’ll be looking in today’s video. So we have a file called as mask. TXT and this is the sample log file. All thanks to the Splunk documentation. For that, I did not really had to create this file.

So we’ll just look into how exactly it might work. So, this is a sample log file and what we’ll do is we’ll put it to Splunk. So I’m in my Splunk instance, so let’s do one thing, we’ll go to Temp and we’ll create a file called accounts log. All right? So the nano is not there, we’ll use VI, we’ll use accounts log and what we’ll do, we’ll copy the information which is part of mask. TXT and we’ll be putting it to the accounts log file. Perfect. So once you have done this, let’s go to opt Splunk etc. System local. And basically within local there is a file called Inputs corner and within this file you only have the default stanza. So what you want is you want your Splunk to monitor the accounts log file that we have created.

So this is a sample document which contains the way in which you can monitor. Again, this is very simple, but just for the documentation sake, I have added it in. So let’s do one thing, let’s go to inputs con F and we’ll just use this stanza. And what it is doing, it is monitoring what it is monitoring for, it is monitoring for the accounts log and it is assigning a source type this. So this is the source type which has been assigned and let’s go ahead and save it. And the next thing that we need to do is we need to add a props connect props I’m sure you already know by now we already have been discussing in detail in the previous sections related to props conve. Now within the props convey, if you will see we have two sections. Let’s take this up and we’ll paste it here.

So you have the two section, one is the SSN hyphen CC anon. So this is the section and then we are using a function called as scdcmd. So this SCD CMD is like a function which basically allows us to define a set syntax. So said is basically Linux utility. I’m sure many of you might have already used it. So it allows you to use a set like syntax within the value pair. So you also have option for regx, you also have option to use set based commands. Now if you quickly look into the props conve manual and if you do a set CMD, you typically see that you have a set CMD hyphen. You can give a class name and basically this is used for anonymizing the data we already see here and basically it is used for putting the set scripts here. So this is why we have the set CMD function which has been used and this is a script which matches the data that we have within the accounts log.

So we can go ahead, let’s go ahead and save this and let’s go ahead and restart our Splunk instance. So we do opt Splunk bin splunk restart. Perfect. So now the splunk is restarted and if we quickly go to a Splunk instance we’ll go to search and reporting app. So there is one event which has come. So if I go to data summary and if you click on the host you would typically see now within the event you have the correct credit card information as masked. So this is done at the indexing time itself. Now within the example we had made use of Said CMD, you can even make use of reg X to define what needs to be masked and do the masking at the indexing time accordingly. So with this we’ll conclude this video. I hope this has been informative for you and I look forward to seeing the next video.

  1. Search Head

Hey everyone and welcome back. In today’s video we will be discussing about the search head component in Splunk. Now basically, search head is a component in Splunk enterprise whose responsibility it is to manage the search related functions. It also is responsible for sending the search request to the responsible indexes, retrieving the results and merging the results before it sends back to the user. So this can be explained with a billow diagram where you have multiple indexer nodes over here and then you have a search head. So this search ad component will send the request to the indexer node. Indexer node, I hope you remember it is responsible for lot of functions. The primary ones are related to parsing and indexing. So search head will not contain any data. It will send the request to the indexers. Indexers will fetch the request.

Depending upon the request with the search head component gives, it sends the request back to the search head. Search head will merge all the results and send it back to the user. So that is the primary responsibility of the search head. Now, typically when you are doing a clustered setup or even a distributed setup, what you typically have, you will have a search head cluster and you will have the indexer cluster. Now indexer by the indexer you will see all the forwarders, it may be universal forwarders. It will send the data to the indexer nodes over here and the data will be stored in this specific cluster. All right, so there is no data, no index data will be stored in the search head cluster, all the index data will be stored in the indexer cluster.

Now, in order to search this cluster, we make use of search head. Now if you see this diagram, one search head member over here, it connects to multiple indexer nodes because it might happen that typically when you have a clustered setup, 20% of the data might be in indexer one. Say 20% might be in indexer, 220 percent might be in indexer three. So typically when you search, it needs to be searched across the indexer nodes. So search head typically will send the data to all of the indexer nodes, it retrieves the result and again the user will be able to find those result sets within the search and member. So when it comes to the search head, searches are used for number of functions.

Some of the primary one includes definitely search related functions, building dashboards and reports as required. It is also used for building data models or data model acceleration. And definitely the alerting related functionality is also built within the search end. So typically if you have this type of architecture, the users will not search the indexer nodes here. They will log into the Splunk instance which is part of the search ed cluster. They’ll put the SPL query here. Search head will interact with n number of index or peer nodes which are present in the cluster, retrieve the result and send it back to the user. So that is the primary component of Search At function. So in the upcoming sections we will be discussing upon how we can formulate the Search head cluster after the indexer cluster is built. So we have two sections which are dedicated for the indexer cluster as well as the Search At cluster.

  1. Splunk Monitoring Console

Hey everyone and welcome back. In today’s video we will be discussing about the Splunk’s monitoring console. So till now we have been discussing primarily related to search heads related to indexes. However, typically when you have a Splunk instance and someone is saying let’s say you have a Splunk instance in your organization and a user comes back to you and say hey man, the searches are not working quite fast, there is lot of lag that is happening, the search performance is slow. So monitoring is one of the most important aspect. Post you have your deployment set up. So you need to continuously monitor not only your server performance but also various other performance including indexing performance, the searches performance, et cetera. And all of those is the capability of a component called as the monitoring console.

Now, the monitoring console basically allows us to see the detailed information not only about the topology but also related to your Splunk enterprise deployment. Now, the monitoring console provides prebuilt dashboards which gives you visibility on various areas as we discuss which varies from indexing performance, resource usage, license usage and many more. So, this is the screenshot that I have taken from one of the clusters that we have and you will see, it says that there are three indexes and there are three surgeries. Now it also says that one of the instances among three indexes is unreachable and one of the instances in the search head is also unreachable. It is also telling us about the resource usage saying that the average is 75. 25%, the memory average is 15% and on the right hand side it says that there are two peers which are searchable. It speaks about the bucket copies, the total bucket. This is related to the indexes as well as various CPU and memory usage capabilities.

Now the Splunk monitoring console, it comes pre built with lot of dashboards which we can use out of the box. So we do not really have to build any dashboards, we can just use the dashboards which comes pre built. Now these dashboards comes based on following areas. One is the search performance as well as the distributed search framework. You have the indexing performance, you have the operating system usage, you have index and volume usage, the Http one collector performance, the TCP performance and various others. So this is a theoretical part. Let’s go to the Splunk and see on where do we find all of these. So I’m in my Splunk instance and if you typically go to settings on the left hand side you have a feature of monitoring console. So let’s click on the monitoring console. So this is how the Splunk monitoring console overview page looks like. So if you see on the left hand side it is showing us the indexing rate.

It also shows the disk usage, the license usage and if you go a bit down it will tell you about the CPU usage in terms of which process is taking the highest CPU. It talks about the memory usage and it can also tell you information related to KV store. Now, this monitoring console if you see it has a lot of options like you have indexing performance, you also have search usage statistics. If you go to resource usage, you have resource usage by instance, resource usage by machine. You have forward a related configuration. So all of these dashboards comes pre built with the monitoring console. And typically when you have splunk instance up and running in a production monitoring console is one of the important component that you should be regularly looking at.

So, for example, if we go to the resource usage, if you will see a bit dump, it says that these are the amount of codes. This is the machine hostname and if you go down, it gives you the disk usage for the root partition, for the opt splunk war partition, and if you go bit down, you’ll get more information related to CPU usage by Process and various others. Same goes for the resource usage on the machine level and if you go back down it basically will give you information related to physical memory which has been used, the amount of load average that is there on the machine and various others dashboards comes pre built here. Along with that if you typically go to the indexing performance this will typically show you how the indexing performance works. So I’m sure you might have already seen this diagram or not exactly this diagram but something similar to this when we were discussing about indexer component where we had the parsing pipeline as well as the indexing pipeline over here.

So after index pipeline the data directly goes to the disk so based on this you get various performance metrics related to the indexing rate. You can search by source type, index, host as well as the source. And if you go a bit down, you get estimated CPU usage by Splunk processor. And there are various other dashboards which are built in along with that the search usage statistics also proves to be important many times so typically it gives you the search as well as the median runtime on how much, what is the runtime in terms of timeline it takes, as well as the long running searches.

This is also important because it might happen that there is a search which is running in the background which is taking huge amount of CPU and memory and any user who is trying to do search at that amount of time, they are not able to get the results in a meaningful amount of time. So for such use cases you can look into the long running searches. This is very similar to the long queries or I would say a slow query in terms of database structure. So all of these are part of the monitoring console.

img