Managing Azure VM Diagnostics Data in Table Storage

Windows Azure Diagnostics (WAD) data is stored in table storage of a storage account. There are many types of tables to store different kind of Azure Windows VM diagnostics data and some of them are mentioned below:

  1. WADMetrics*: Used to store metrics data. It follows a proper naming convention whose details can be found here
  2. WADDiagnosticInfrastructureLogsTable: stores diagnostics infrastructure data
  3. WADWindowsEventLogsTable: Stores event logs data
  4. WADPerformanceCountersTable: stores performance counters data

The issue is we don’t have any mechanism to apply retention policy to these tables or any way to directly delete the data. With time, the data grow exponentially and eventually you pay for all of it. As of August 2019, you have to manually delete the old data/tables. Now there are two ways to do it:

  1. Delete the tables altogether: This is simple and Azure re-creates them to store more WAD data. But the issue is, if your table has millions of records (like mine) it might take few days to delete the table completely. And during this time, any attempt to re-create the table will throw an error. So if your VM is writing constantly to these tables, not a good option
  2. Partially delete the data: Even though it is cumbersome and required writing custom code, I found this better. Basically you can run this code on schedule to delete all the data that is more than a month old (or any time frame). But notice, that you pay for all the operations including deleting the data, even though the cost is very low.

So I decided to go with the approach of making a utility to delete all the data that was more than a month old from logs and performance counters table. But for metrics table, since these tables are created thrice per month, every month, hence I deleted the tables more than 2 months old.

Below is the code to achieve this in normal console application, which you can use as web job too. Install “WindowsAzure.Storage” for it to work.

Delete All Metrics Tables more than two months old:

You have to run the below code twice for PT1H and PT1M tables. Also if your storage account has older data than current year’s, run the code for previous years too by looping through all the years (instead of months)


using System;
using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Table;
using System.Threading.Tasks;
using Microsoft.Extensions.Configuration;
/// <summary>
/// delete old metrics table both PT1H and PT1M
/// </summary>
/// <param name="tablepPrefix">WADMetricsPT1HP10DV2S and WADMetricsPT1MP10DV2S</param>
/// <returns></returns>
private async Task DeleteOldTables(string tablepPrefix)
{
try
{
string connectionString = ConfigurationManager.ConnectionStrings["StorageAccountConnectionString"].ConnectionString;
CloudStorageAccount cloudStorageAccount = CloudStorageAccount.Parse(connectionString);
CloudTableClient cloudTableClient = cloudStorageAccount.CreateCloudTableClient();
TableContinuationToken token = null;
do
{
string currentYear = DateTime.Now.ToString("yyyy");
int currentMonth = Convert.ToInt32(DateTime.Now.ToString("MM"));
for (int i = 1; i < currentMonth – 1; i++)
{
string currentTablePrefix = tablepPrefix + currentYear + i.ToString("d2"); //convert single digit month to two digit
var allTablesResult = await cloudTableClient.ListTablesSegmentedAsync(currentTablePrefix, token);
token = allTablesResult.ContinuationToken;
Console.WriteLine("Fetched all tables to be deleted, count: " + allTablesResult.Results.Count);
foreach (CloudTable table in allTablesResult.Results)
{
Console.WriteLine("Deleting table: " + table.Name);
await table.DeleteIfExistsAsync();
}
}
} while (token != null);
Console.WriteLine("Old Tables Deleted");
}
catch (Exception ex)
{
Console.WriteLine("Exception occured while deleting tables " + ex.Message);
}
}

Delete logs data more than x days old:

You have to run the below code for all the storage logs table (Infrastructure, Events, Performance etc.)


using System;
using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Table;
using System.Threading.Tasks;
using Microsoft.Extensions.Configuration;
/// <summary>
/// code to delete data from logs table
/// </summary>
/// <param name="numberOfDays">days before which all data should be deleted</param>
/// <returns></returns>
private async Task DeleteOldData(int numberOfDays)
{
string connectionString = ConfigurationManager.ConnectionStrings["StorageAccountConnectionString"].ConnectionString;
CloudStorageAccount cloudStorageAccount = CloudStorageAccount.Parse(connectionString);
CloudTableClient cloudTableClient = cloudStorageAccount.CreateCloudTableClient();
CloudTable cloudTable = cloudTableClient.GetTableReference("WADPerformanceCountersTable"); //do this for all other logs table too
//https://gauravmantri.com/2012/02/17/effective-way-of-fetching-diagnostics-data-from-windows-azure-diagnostics-table-hint-use-partitionkey/
string ticks = "0" + DateTime.UtcNow.AddDays(-numberOfDays).Ticks.ToString(); //calculate number of ticks for required date for faster query
TableQuery<TableEntity> query = new TableQuery<TableEntity>().
Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.LessThanOrEqual, ticks));
Console.WriteLine("Deleting data from " + cloudTable.Name + " using query " + query.FilterString);
TableContinuationToken token = null;
try
{
do
{
Console.WriteLine("Fetching Records");
TableQuerySegment<TableEntity> resultSegment = await cloudTable.ExecuteQuerySegmentedAsync(query, token);
token = resultSegment.ContinuationToken;
Console.WriteLine("Fetched all records to be deleted, count: " + resultSegment.Results.Count);
foreach (TableEntity entity in resultSegment.Results)
{
Console.WriteLine("Deleting entry with TimeStamp: " + entity.Timestamp.ToString());
TableOperation deleteOperation = TableOperation.Delete(entity);
await cloudTable.ExecuteAsync(deleteOperation);
}
} while (token != null);
Console.WriteLine("Entities Deleted from " + cloudTable.Name);
}
catch (Exception ex)
{
Console.WriteLine("Exception occured while deleting data from table " + cloudTable.Name + " " + ex.Message);
}
}

You can use the above to delete data incrementally on a schedule (via web job, azure function etc) or deleting all of the old data.

P.S. Cloud Table Client uses default exponential retry policy which you can change too. If you do not want to use any retry mechanism, make sure to set it to none. Good thing is, you can set retry policy to a particular action (like Delete) instead of table client in all. For this checkĀ implementing retry policies

Hope this helps! Happy coding!

References:

  1. Effective way of fetching diagnostics data from Windows Azure Diagnostics Table
  2. Understanding Windows Azure Diagnostics Costs And Some Ways To Control It