PL-300 – Section 9: Part 1 Level 8: Other Visualization Items Part 2

  1. Use Error Bars

Now, sometimes the data that you have may not be wholly accurate. For example, suppose, I were to say that the sales growth was this much, plus or minus a little bit. So, the data that I have is probably the most likely version but it could be out by a few percent or it could be within a certain range. This is when error bars could be useful. Error bars allow you to show that range. So, if I go to the add further analyses and scroll down, we can see error bars. These were added in around March, 2022. So, if you’re using an older version, you won’t see it.

Now, if I expand it, you can see that, firstly, it says, what are you applying these settings to? You may have more than one series in your line or bar chart that you want to apply error bars to. Then we’ve got the options.

First of all, do you actually want them? Is it enabled? So, you can switch that on or off, and then you can say these are the values for my low and my high. So, I can either say by field and I’ll have a number in the field for my upper bound and one for my lower bound. I can also say, are these figures the absolute figures or are they relative? So, for example, suppose I had a line chart by the figure was 30,000. I could have an upper bound of 40,000 and a lower bound of 10,000, and therefore I would have an error bar going all the way from 40 to 10,000 with the actual line chart going at 20,000. Alternatively, I can say it is relative. So, I can say the upper bound is out by up to 10,000 and the lower bound by 20,000. So, it takes the 30,000 on the line chart and adds 10,000 and subtracts 20,000.

Now, alternatively, you can also say it can be out by up to this percent. So, you can see that it has added error bars because this is a lot easier way to use these error bars if you don’t have absolute figures. So, I can say this can be wrong by up to 50%, and now you can see that the error bar goes all the way up to 50% above the sales volume. Or it could be as much as, let’s say 50% below, or maybe even a hundred percent below. So, whatever you choose, it should be realistic of your data.

The next section is bar. So, this shows or doesn’t show the bar. And here you can format it. You can match the colour series. So, you can see changes to the blue that this series is using. Or you can say, “No, I want to specify my own colour.” So, I want red, for instance.

You can say what width you want and also the shape that is shown at the very top and the very bottom of the error bar. So, traditionally it’s a horizontal line and a thin width. You can also say what size the marker, the thing at the top, is going to be. You can also have a border colour and a border size. So, generally, you might not want to adjust any of these settings.

Now, suppose that you didn’t want a bar, suppose you wanted a band. Well, you can do that as well. So, you can say, “Okay, I want a band from the top filled down to the bottom of the error range.” Now, you can say, “I want a fill,” or “I want a series of lines.”

Now, it’s a bit difficult to see the difference but if I go up into years, you can now see that this is shown as a line and that is shown as a filled area as opposed to just a simple bar. So, whichever one is most effective in your visualisations is the one that you should use. And you can say, I want both a fill and a line, if you so wish. You can also adjust the band colour or you can make it match the series. Change the transparency, so lighter usually works, and the marker shape and size. These last two aren’t used if you’ve already got the bar shown because that’s a bit duplicationary.

And then we have got error labels. So, that can show the absolute figures that are being used, or it could show the difference in percentage terms. So, I said 10% above and 10% below as opposed to the, in this case, the sales volume above and below.

And I can also show the range as well which is fairly similar to the absolute figure. I can also adjust the background of these labels and change the transparency.

And then finally there is a Tooltip. So, if I hover over them you can see the central value, in this case 24.47%, and the upper and the lower extent of our error bars.

So, these are error bars. You can use them by clicking on your line or bar chart and going to add further analyses. Go to error bars and then you can enable them, say how you’re going to have the top and the bottom of the range specified, either by field or by percentage. And then you can adjust the bar, the error band, and the error labels, and tooltip accordingly.

  1. Identify Outliers

In this video, I’m just going to have a quick word about how to identify outliers.

Now, we’ve already had a look at how we can make use of the analytics lines and putting constant lines and see what values are above and below. Now, it doesn’t look like we are significantly above here, about 20%. It doesn’t seem like 2006 there is much of an outlier. But the problem with this is that we’re at the year granularity, all of our axes data is in years. If I drill down so that it is months, have a look now at 2006. But then have a look at 2010; 2009:2010. Now, it didn’t really look that 2009:2010 there could be such huge values. But if you drilled down to something closer to the source data, it does look like there’s something big.

Now, another thing that you can do is to use the scatter chart. So, this really gets down to a base level of granularity when used right. So, let’s say we have the sales volume in the X-axis and then we have the average price in the Y-axis.

Now, so currently we only have one point because this is averaging everything together. So, what we need is the details, let’s see if we can drag dates in. So, go down the hierarchy. And the further down the hierarchy you go, the more data you’ve got. So, we can see for instance these two end ones with zero sales definitely looks like there are outliers.

Now, we can also add in a legend, maybe we’ll add in the legend of the region name. And again, this increases the number of dots we got. So, we can see potentially, we’ve got things like this Greater Manchester in 2004 July, that is a bit of an outlier in that we have got a huge number of sales, but not necessarily huge price it’s just somewhere separated from everything else. And then we’ve got these in West Midlands where we’ve got very low sales, but high level of prices.

So, there’s no particular magic formula to identify outliers. It’s the case of looking at the data at various granularities and going what is this data telling us? Because some granularities, for instance, at the year you might not be actually seeing anything that’s an outlier whereas if we go down to the Christmas period of that year, we can see real outliers. And if you have got two measurements that can be plotted against each other, then do so in a scatter graph with as much detail as you can and then see if any of these figures, these individual plots are just away from each other. They could represent outliers. They could represent typographical errors, you know, you didn’t have 1000 sales, it shows you 10,000 or it shows you a hundred sales. And in that way, you can investigate the data better, and see if there are any obvious errors.

  1. ** Use Clustering

Now, in this scatter visualisation, I was able to use a region named legend. So, all of these dots got little colours.

Now, suppose, I haven’t gone down to this level of detail. Let’s take away region name and let’s change date from date hierarchy to just a standard date. So, I’ve got a bit at the top, I’ve got some bits in the middle and then I’ve got this big bit at the bottom. So, I can divide this scatter visualisation into three.

Now, we can get the computer to do this automatically by using something called clusters. And I do that by going to the dot-dot-dot (…) of my scatter visualisation, and saying automatically find clusters. So, what it’s going to do is it’s going to add a new field and it’s going to be called date and in brackets clusters.

Obviously, you can change this, however, you wish. And it’s going to give a description and it’s going to just look at the data and decide for itself how many clusters it needs. So, I’m going to click on, okay. And you will now see that we have a legend date brackets clusters. And it’s put them into three clusters; cluster one, two, and three. So, it’s got this bit on the top left, and then this rectangle here and then this rectangle at the bottom.

Now, if you say actually, I don’t want three clusters I want four clusters, then no problem. Just go to the field that has been created. So, in this case, date brackets clusters, click on the dot-dot-dot (…), and you can edit to clusters. So, maybe, I want to change to five clusters.

Now, changing the number of clusters will rerun the clustering and discard any current names like cluster one, two, three, four, and five. So, suppose I don’t want it to be called one, two, three, four, five, no problem, I can go to edit clusters, and double-click on cluster one. So, I’ll call this top left, and now it’s renamed.

Now, each of these dots, each of these calculations where sales volume meets average price based on date, has been saved into power BI. So, what?

Well, let’s suppose we had the 2017 data coming in addition to all the data up to 2016, all of those that would not be put into a cluster automatically, because they are not one of these existing dots. So, if you want that to be re-evaluated, then just go into edit clusters and edit the number of clusters. So, if I district it down to four, then it would recalculate the clusters while including all of the new data.

Now, there are some limitations on this. So, if I get rid of this cluster and if I delete my cluster field, which incidentally now is just a regular field, so you can filter on it, you can create other visualisations based on it. So, I’m just going to delete it from the model and I’m going to change this from a date in the details where we can see automatically find clusters, to the date hierarchy. And you can see automatically find clusters no longer appears.

So, there are some limitations regarding this, but if you want a way to automatically add, all of the colours and sort of group all of your scatter plots, which are in a similar sort of area, like all of those in the top left, all of those in the top right, then you can convert it to and simpler scatter visualisation and then go to the dot, dot, dot, and automatically find clusters.

img