Analysis of (time-series) data being collected in (near) realtime often provides insight for a process underway. As the current situation with COVID-19 is still critical lots of data are being collected, curated and streamed throughout the world, and researchers strive to get the most out of them to identify patterns and, maybe, make predictions. In this post I am showing how to download and analyse live data using MATLAB/Octave.

First thing is to find the data sources, which should in any case be reliable. For the purposes of this post I am going to use two sources:

- open-covid-19.github.io (direct link to CSV data)
- data.world (direct link to CSV data)

Then, those data should be downloaded and parsed to be useful for processing and visualisation. In the scenario adopted in this post, what is of interest is to create a view of the confirmed cases and deaths normalised by the country population, for a selection of countries, in this case China, Italy, Spain, United Kingdom, United States of America, Greece.

Let’s take a look at the code that downloads and processes the data. Suppose we are considering the data from `GitHub`

.

[Matlab]

sourceURL = ‘https://open-covid-19.github.io/data/data.csv’;

live_data = urlread( sourceURL );

theHeaderFormat = ‘%s%s%s%s%s%s%s%s%s%s’;

theDataFormat = ‘%D%s%s%s%s%f%f%f%f%f’;

data = string2table( live_data, theHeaderFormat, theDataFormat);

data.CountryCode = categorical( data.CountryCode);

data.CountryName = categorical( data.CountryName);

data.RegionCode = categorical( data.RegionCode);

data.RegionName = categorical( data.RegionName);

confirmed_cases_column = 5;

deaths_column = 6;

population_column = 9;

data = table2timetable( data );

[/Matlab]

The conversion of some fields into `categorical`

is not mandatory at this point. The auxiliary function `string2table`

is transforming the string containing all the downloaded data into a meaningful table as shown in the following piece of code. The `X_column`

variables denote the columns in the data table that contain the corresponding `X`

data. The `table2timetable`

just converts the table to a time series table.

[Matlab]

function tabl = string2table ( s, headfmt, datafmt )

Fields = cellfun(@(x) x{1}, textscan(s, headfmt, 1, ‘Delimiter’, ‘,’), ‘un’, 0);

theData = textscan(s, datafmt, ‘Headerlines’, 1, ‘EndOfLine’, newline, ‘Delimiter’, ‘,’);

tabl = table(theData{:}, ‘VariableNames’, Fields);

[/Matlab]

Then all we need to do is define the countries of interest and go ahead with the plots.

[Matlab]

countries_of_interest = {‘China’,’Italy’,’Spain’,’United Kingdom’,’United States of America’,’Greece’};

[/Matlab]

Suppose we want to create graphs, one for the normalised confirmed cases and one for the normalised reported deaths. The normalisation is supposed to be based on the respective country population, so that the graph actually corresponds to *the power the country loses*. First, let’s create the confirmed cases graph.

[Matlab]

figure; hold on; set(gcf,’position’,[1000 1000 800 500])

for c = 1:length(countries_of_interest)

% read the confirmed cases column from the data

country_data = data(find(data.CountryName==countries_of_interest{c}), confirmed_cases_column);

% read the population column from the data

country_population = data(find(data.CountryName==countries_of_interest{c}), population_column);

% select the population reported in the first occurrence of a country

population = country_population.Population(1);

% plot the normalised cases as a percentage

stem(country_data.Date, 100*country_data.Confirmed/population);

end

% make it a bit nicer

title( ‘Normalised confirmed cases – source: GitHub’);

legend( countries_of_interest, ‘location’,’northwest’);

xlabel( ‘Time’); ylabel( ‘% of total country population’);

set( gca, ‘Fontsize’, 16);

axis tight; grid on; box;

hold off;

[/Matlab]

This code will create a graph that will look like the following figure.

In pretty much the same way let’s create the normalised deaths graph.

[Matlab]

figure; hold on; set(gcf,’position’,[1000 1000 800 500])

for c = 1:length(countries_of_interest)

% read the deaths column from the data

country_data = data(find(data.CountryName==countries_of_interest{c}), deaths_column);

% Read the population column from the data

country_population = data(find(data.CountryName==countries_of_interest{c}), population_column);

% select the population reported in the first occurrence of a country

population = country_population.Population(1);

% plot the normalised deaths as a percentage

stem(country_data.Date, 100*country_data.Deaths/population);

end

title( ‘Normalised deaths – source: GitHub’);

legend( countries_of_interest, ‘location’,’northwest’);

xlabel( ‘Time’); ylabel( ‘% of total country population’);

set( gca, ‘Fontsize’, 16);

axis tight; grid on; box;

hold off;

[/Matlab]

This code will create a graph that will look like the following figure.

Following the same strategy, bellow are the respective graphs created by using the data provided by `data.world`

(which may look a bit different from the previous graphs due to either more or less data).

**** It should be noted that working with data from data.world is a bit different, as those data do not include population information; that information should be acquired from other resources.*

**** Please note that no processing of the data is being done in this tutorial; the data include values per province and not per country, thus, one should process those values to extract meaningful conclusions!*

**** The featured image in this post is an illustration of COVID-19 by the Public Health Image Library*