Rees Morrison
4 min readSep 28, 2020

--

My previous nine articles have each focused on a different finding from an ever-growing set of blog posts that use R to investigate Covid-19 data. The links to those articles appear at the end of my most recent one, on the geographic focus of the posts.

For this article, I categorized each blog according to its web host site. For example, some of the hosts are sites maintained by a company, such as Grattan, or university, such as the University of Melbourne, where employees contribute the blog content. Those I categorize as “Organization”sites.

Other posts are published on a blog that has a name given to it by the blogger (such as “Free Range Statistics” or “Freakonomics”) or have no name but whose URL starts with their full name (such as “kieranhealy” or “peymankor”). I call all of those “Title” sites (but at some point I may separate the two styles of identification). Still other sites have no name or personal identifier after “http:” or “https:” so I categorize them as “Untitled”. The fourth category consists of aggregators of blogs (categorized as “Aggregators”). These sites collect and curate submissions from bloggers. R-Weekly and Medium are examples. When I find a post on a blog aggregator, if possible I click through to read and copy the original post. Therefore, I don’t know how many featured later on R-bloggers, for example. Not having kept track of how I learned about all of the 317 blog posts collected so far, I can’t report on posts that appeared on an aggregator unless they were published in the first instance on the aggregator.

To start the analysis of the 144 different blog sites, I have listed them in tables (using the gt package). If any reader has corrections, please drop me a line at rees [at] reesmorrison [dot] com.

The plot below shows the total number of posts as of this writing that fall into each category. However, the plot goes one step further. As it was easy to extract from the URL whether the site uses “http:” or “https:”, I created yet another descriptive variable to capture the two possibilities. Thus, Title posts account for the most frequent type (207, 65% of all the blog posts) and among them 26 blog posts (13%) use “http:”. This can be seen from the top segment of the column.

Another way to present the same data on site category and “http(s):” is with a mosaic plot. Below, the mosaic plot displays in a different way the dominance of “https:” over “http:”. By the way, “https:” is “http:” (HyperText Transfer Protocol, used to connect to Web servers on the Internet) with encryption. The only difference between the two protocols is that “https:” uses Transport Layer Security to encrypt normal “http:” requests and responses (its predecessor is SSL, Secure Sockets Layer). As a result, the newer “https:” is far more secure than “http:”.

The width of the categories conveys visually the predominance of Title blogs (the third from the left). The columns array alphabetically from the left as per the legend. The smaller rectangles at the bottom represent “http:”. Aside from the technical details of the encryption protocol, which may simply reflect older versus newer blogs, it’s clear that bloggers favor a moniker or personal identifier on their blog. They tend to choose a distinctive name for their blog, e.g. “I’m a Chordata! Urochordata!”, or put their full name in the URL. The titled blogs are easier to remember, to refer to, or to follow up on so they are more effective when it comes to contributing to knowledge sharing.

--

--

Rees Morrison

An enthusiast of R programming, surveys, and data analysis/visualization