Dictionary

Important terms and expressions

The Internet

is an electronic communications network that connects computers around the world.

merriam-webster
The web

is the part of the Internet that can be accessed by a browser. For instance, email and apps are also part of the Internet, but not the web.

merriam-webster
Website

is a form of online publication, the ensemble of several pages linked together and browsable on the Internet.

merriam-webster
Web archiving

is the practice of downloading and archiving parts of the web in order to preserve its contents and ensure long term access to information.

Domain

is a subdivision of the Internet denoted in an address with a unique abbreviation (such as .lu, or .com).

merriam-webster
URL

the address of a resource (such as a document or website) on the Internet.

merriam-webster
Seed

is a URL-address, used as a starting point for web crawls. One seed can lead to a number of different pages, so the more seeds are “sown”, the more extensive the results of a web harvest will be.

Seed list

comprises all the seeds that were used to build a collection. This list will give you an idea which websites can be found in the collection, however it doesn’t necessarily mean that every page of every website was captured.

Harvest

describes the process of crawling and downloading parts of the Internet, often used as a synonym for web crawl in the context of web archiving.

Web crawler

also called spider, scans every element of a website, following every link and tracing every component on every page. Crawlers are also used for web-indexing by search engines, allowing for faster and more efficient search results by frequent crawls.

Collection policy

is the description of standards and procedures followed while building a collection. A detailed policy helps in understanding the contents and limitations of a collection and informs the user about the web archive’s operating principles.

Broad crawls and targeted crawls

Broad crawls capture a snapshot of a large number of seeds, in our case all .lu domains, which we capture twice a year.
Targeted crawls aim at a specific topic or event, potentially with a higher frequency of captures of a smaller number of seeds.

Missing something?

What terms and expressions do you think are missing from this page?
Help us in expanding the Luxembourg Web Archive dictionary by sending in your questions and suggestions.

Also remember that we are looking for contributions of new and noteworthy websites to be included in the archive. Simply contact us, or use the submission form under “participate and contribute” below: