Luxembourg’s green Internet

Measuring the Internet:
Nature and environmental protection on the Luxembourg web

The Luxembourg Web Archive is more than a phone book for Luxembourgish websites. The data from our collections is a largely untapped ressource for research. In order to gain a better understanding of the web, its contents and contributors, we decided to dive into the data, with a special collection on Biodiversity and Environmental Protection. Our aim is to gather as many websites around this topic as possible, group the list into different categories, which will then be analysed accoring to:
– Link structure
– Most common names of animals and plants
– Shared topics among categories

The Collection

Download the complete seed list here

Seed List

Seeds

CATEGORIES

TOTAL GB

MB PER SEED

TOTAL DOCS

AVG DOCS PER SEED

Planting Seeds

Our initial research to build a seed list, first lead us to include organisations and state institutions in charge of promoting environmental protection. The scope of the collection then quickly expanded to organisations invested in climate activism and sustainability. All of these aspects will of course also touch on the topics of gardening and agriculture, but also lead towards natural sciences and media specialised in this field. We also decided to include blogs,  Wikipedia articles and websites about sustainable products and businesses, in order to allow for interesting theme comparisons between the different categories, such as youth organisations vs. state institutions.

We did not set a limit to the number of websites and therefore, the number of seeds per category is not equally distributed. Also, in order to find as many relevant websites as possible, we asked for the support of organisations and public institutions promoting environmental protection, to complete our seed list.

Growing Datasets

The results of analysing data from a web archive, can only be as fruitful, as the care that was put into building the underlying collection and datasets. There is no use in “trying to grab as much information as possible and seeing what comes out at the end” – with too much data, the technical barriers become very high very quickly and messy datasets won’t produce any comprehensible insights. Therefore, we decided to keep this collection relatively simple and limit ourselves to one crawl of all seeds in the collection. Moreover, the crawl time was relatively short, so the total amount of data captured is relatively low.

The main aim of thematic collections is to grow over time and be part of our regular crawls, so the topics of Biodiversity and Environmental Protection will continue to be preserved in the future. However, for the purpose of this project, we wanted to analyse a specific snapshot of this collection and compare different categories of websites.

Branches & Leaves

The graph on the right illustrates the outgoing links from websites on the seed list, illustrating the connections of seeds linking to the same websites. It is not surprising that sites like facebook.com, instagram.com and twitter.com represent prominent junctions in the network. Others like public.lu, or myenergy,lu suggest that many websites will link to governmental websites, for official information about energy transition and environmentalism.

The most surprising actors in this “treetop” visualisation are oekotopten.lu and smellslikeagreenspirit.com. The number of outgoing links should be due to the nature of these websites: the former being a recommendation site for sustainable products by the Mouvement écologique and the latter being a blog about eco-friendly products with a commercial orientation and ads.

It is important to note that this visualisation does not show all outlying connections and that for illustration purposes, only part of the network can be displayed.

Bees & Trees

The word cloud on the left shows the most common mentions of plants and animals found in the texts of the 849 seeds. Naturally, the Luxembourg Web represents a wild garden of languages and variations of names. In this case, it is also notable that Latin names are also present, which is due to the inclusion of Wikipedia articles and websites about Natural Sciences. Bees are apparently a part of many discussions, since they appear very prominently as Wildbienen, Abeilles, but also as Bestëbser a Luxembourgish translation of the French pollinisateurs. The same goes for more generic terms for flowers and trees, such as Bäume and Fleurs, but also more specific names, which are rarely used in every day conversations, such as Panewippercher, Benjesheck, Jäizert, or Juddekiischt.

Root connections

A further look into the different categories in the collection, allows for a topic analysis, where buzzwords are detected within the full text of each website. The graph on the right illustrates the connections in topics between different categories, where the same buzzwords were encountered. Common terms that we extracted are rather generic, such as green, sustainable, biodiversity, engerywhere we could identify 2 different perspectives on the general theme of the collection: the macro-perspective, with global challenges for society, like sustainability, transition énergétique, Klimawandel, or Biolandwirtschaft, and a micro-perspective, with local activities, that affect private citizens in Luxembourg, such as jardins, Antigaspi, déchets, Geméiskuerf, Gärtner, regional. Surprisingly, the Schottergäert are mentioned several times in different languages, indicating that they are probably considered to be a nuisance to the biodiversity in the Luxembourg garden landscape. The category of sustainable products, was included in the collection with the notion of “greenwashing” websites in mind. While looking out for traces of the instrumentalisation of environmental values within commercial interests, we can observe connections to the categories sustainable way of life, youth organisations and blogs. This might suggest an interlocking of youth culture, sustainable living and “green” products, especially in the direction of beauty-products, with buzzwords such as makeup, skincare and fashion.

Seeing the wood for the trees

The example of alleged greenwashing in the biodiversity collection exposes the limits of distant reading in the context of web archive data analysis. At first, there is always a certain degree of bias in a dataset, which might also be influenced by the anticipated research question. Secondly, the distant reading approach only offers a glimpse into possible directions, connections, exceptions, and findings where a close reading analysis of individual websites becomes necessary.

Nonetheless, data analysis from web archive collections offers a new set of possibilities to scientific research into our history and the history of the web itself. The National Library of Luxembourg will therefore continue to expand the exchange and collaboration with universities and research networks, in order to improve the capabilities and accessibility of the Luxembourg Web Archive.