The recency of the data your dashboard is displaying can be one of the most important pieces metadata on a dashboard. Users of dashboards will undoubtedly, at some point in time, have the question go through their mind: “I wonder when this data is from?” or similar. They will want to know the recency of the data, or the Data Freshness.
Using The Right Dates
Data Freshness is the date and/or time at which the data was last updated. More accurately, it is the latest date up to which the data is reporting.
That’s an important distinction, in fact. The Data Freshness is not just when the dashboard’s dataset was refreshed, or even when the underlying dataset was refreshed. In fact, it isn’t even necessarily when the source system database was refreshed. The Data Freshness shouldn’t be tied to any ‘refresh’ date at all, really. Instead, the Data Freshness should be the latest date from a relevant field in the data set. For example, this could be Transaction Date, or Order Date, or Payment Date, etc. Ideally a date that is relevant to the subject matter of the dashboard.
Why is this? Primarily this is due to potential issues in ETL processes or data updates. Take this scenario for example: the preceding ETL or data preparation process fails, and no data in the underlying dataset is updated. The dashboard data source is still refreshed, but does not pull any new data, as there is none to pull. If your Data Freshness label is using the ‘Last Refresh Date’ as the indicator, it will mislead users in to thinking it is up-to-date when in fact it isn’t. This could go on for any number of ETL cycles until it is discovered by a user – which will greatly decrease the trust that user and their peers has in your dashboard.
If, however, the Data Freshness label pulls from a date field in your dataset, then it will give a true and accurate reflection of what dates the data covers, and whether it is out-of-date or up-to-date.
Go a step further, and have multiple dates. Both the Data Freshness and the Last Refresh Date.
How to do this in Tableau
An easy way to pull the latest date from the data is using a whole-set LOD calculation. That is, a level of detail calculation that does not define a detail level. For example: {MAX([Transaction Date])}
The curly brackets indicate a Level of Detail calculation – which usually includes either FIXED, EXCLUDE, or INCLUDE. When those keywords are omitted, the LOD instead applies the aggregation across the entire data set. This will avoid any filters you have applied to the sheets and always get the maximum (except if you’re using context filters or extract filters).
If you are pre-filtering your dashboard based on the user, then it would be wise to include that dimension in the LOD.
Displaying Data Freshness
Data Freshness labels can be displayed on any and all dashboards, either prominently or using progressive disclosure techniques (such as tooltips or help menus).
If your dashboard is refreshed frequently (e.g. every 1 hour or so), then it is a good idea to also include how many minutes/hours ago it was refreshed. Most people, once they see when it was last updated, will automatically look at their clock and calendar and calculate how long ago that was. Why not take that step out of it for the user, and show them that information straight away. This will reduce the cognitive load of using your dashboard (only slightly, but it all adds up).
Exceptions to The Rule
There are always exceptions to rules, and in this case the main exceptions come in when there are no dates in your data. Or, the dates in your data aren’t related to when that record was created or updated. In these cases, using the date could be misleading, as the data could have changed, and you may be saying that it hasn’t. In this case, you could opt for using a metadata column in your database that indicates when the record itself was last updated (e.g. dtmUpdated).
If all of this is too hard, then showing when the dataset itself was last refreshed is better than nothing. We can’t always have what’s best!