What is URL?

URL (Uniform Resource Locator)

URL (Uniform Resource Locator) - colloquially known as the web address - is a unique sequence of characters that provides means for locating digital information resources (digitally preserved intangible creations of the human intellect) over a network, including, but not limited to, the Internet, which then can be retrieved using various transfer protocols such as HTTP (Hypertext Transfer Protocol) for hypertext documents, FTP (File Transfer Protocol) for files, and mailto for emails.

URL as URI (Uniform Resource Identifier)

URL is a kind of URI (Uniform Resource Identifier).

URI (Uniform Resource Identifier)

URI (Uniform Resource Identifier) is a unique sequence of characters that provides means for identifying physical resources, such as physical objects, places, and people, and logical resources (intangible creations of the human intellect), such as ideas, concepts, written documents, books, songs, movies, games, and digital information resources.

URL Origins

The term URL was defined by the inventor of the World Wide Web, Tim Berners-Lee, in 1994.

The conceptualization of URL is interlinked with the development of the Hypertext Transfer Protocol (HTTP) initiated by Tim Berners-Lee at CERN in 1989. See What is HTTP? for further reading.

URI Syntax

As noted above, URL is a kind of URI.

The generic URI syntax is comprised of five components:

scheme (required),
authority (optional),
path (required),
query (optional), and
fragment (optional).

URI components

scheme://authority/path?query#fragment

The authority component is comprised of three subcomponents:

userinfo (optional),
host (required), and,
port (optional).

URI with userinfo, host & port subcomponents

scheme://userinfo@host:port/path?query#fragment

The userinfo subcomponent can be further divided into the following subcomponents:

username (required), and
password (optional).

Full URI syntax

scheme://username:password@host:port/path?query#fragment

It is worth remembering that the only required URI components are scheme and path.

URL Syntax

URL observes the generic URI syntax however the scheme component is referred to as the protocol. The protocol component can include such values as file, http, https, ftp, and mailto.

Therefore, an URL can be comprised of five components:

protocol (required),
authority (optional) consisting of optional userinfo, required host and optional port,
path (required),
query (optional), and
fragment (optional).

Of those five components, only protocol and path are required.

An URL referring to a local file is an example of the URL with only protocol and path specified.

An URL with only protocol and path specified

file:///Users/johndoe/Documents/mystuff.json URL Syntax.

URL Protocol

URL protocol

URL protocol - the counterpart of URI scheme - is the first of the two required URL components (the second being the path) which denotes a set of rules and procedures governing the manner in which the URL resource should be accessed.

A common protocol value can be file, http (together with its encrypted counterpart https), ftp and mailto.

URL Authority

URL authority

URL authority is an optional URL component which provides directions to a server or servers from which the underlying URL resource is to be served, and optionally user authentication while accessing those servers and port through which those servers should be accessed.

The URL authority component is comprised of three subcomponents:

userinfo (optional),
host (required), and
port (optional).

URL Authority Userinfo

URL authority userinfo

URL authority userinfo is an optional subcomponent in optional URL authority component which provides required username and optional password for the purpose of authentication while accessing the URL host.

The data in the password subcomponent is provided as plain text and therefore its usage is advised against or even considered deprecated.

URL Authority Host

URL authority host

URL authority host is a required subcomponent in the optional URL authority component which provides directions to a server or servers from which the underlying URL resource is to be served.

The host component can be:

a registered domain name (e.g. example.net), which can be prefixed with an optional subdomain (e.g. www), or
an IP address (e.g. 127.0.0.1).

In the registered domain name the last subcomponent (e.g. com, net, co.uk) is referred to as the domain suffix.

A registered domain name is mapped to one or many IP addresses by Domain Name System (DNS).

On the other hand, many different registered domain names can map to one IP address when the so-called virtual hosting is involved.

An URL with a registered domain name as the host

https://www.example.net/movies/?director=scott

An URL with an IP as the host

http://127.0.0.1:8080/users

URL Authority Port

URL authority port

URL authority port is an optional subcomponent in the optional URL authority component which identifies the client process which accesses the URL host.

Some port numbers are deemed reserved and are supposed only to be used by processes using specific protocols.

Commonly used port for web servers is 8080, however when no port is specified HTTP uses implicitly port 80, and HTTPS uses implicitly 443. Those port numbers are reserved for those purposes.

An URL with the port number

https://example.net:443

URL Path

URL path

URL path is the second of the two required URL components (the first one being the protocol) which denotes the logical location to the contemplated URL resource (whereas the protocol denotes the rules and procedures regarding the manner of resource access).

Path can consist of one segment (e.g. /movies) or many segments (e.g. /movies/comedies).

URL Query

URL query

URL query is an optional URL component which denotes parameters that are to be used during the URL resource access.

In an URL the query component is preceded with the question mark (?).

The query syntax is not clearly defined but usually is a sequence of key-value pairs separated by the ampersand (&) delimiter.

An URL with query params

https://www.example.net/movies?director=scott&year=1982

URL Fragment

URL fragment

URL fragment is an optional URL component which denotes the logical location to the URL secondary resource such as an element in HTML document specified by the ID attribute.

In an URL the fragment subcomponent is preceded by the hash symbol (#).

An URL with query params and fragment

https://www.example.net/movies?director=scott&year=1982#actors

URLs in HTTP Requests

URLs are used in HTTP requests to indicate resources being accessed.

Resources that can be accessed using HTTP and URLs are primarily HTML documents.

URLs in Hyperlinks

URLs can be used in hyperlinks.

Hyperlink (aka link)

Hyperlink (aka link) is a user-followable reference to a digital resource.

For example, a hyperlink in HTML is built using the anchor element, and its href (stands for hypertext reference) attribute's value can be specified as URL.

<a
  href="https://soundof.it/http/tutorial/what-is-http"
>
  What is HTTP?
</a>

A href attribute's value can also be specified as a relative URL.

Relative URL

Relative URL is an URL in which protocol and authority components are not indicated explicitly but implicitly through substitution with the relevant values from the currently accessed resource's URL.

<a
  href="/http/tutorial/what-is-http"
>
  What is HTTP?
</a>

URL Converting & Encoding

A given URL can only be for data transmission over the Internet when it consists of URL safe ASCII characters.

A non-ASCII character to be used within an URL needs to be converted to the so-called Punycode consisting only of ASCII safe characters.

An example of a non-ASCII character is ♥ which - to be used within an URL - needs to be converted into xn--g6h.

An unsafe ASCII character to be used within an URL needs to be encoded to a set of safe ASCII characters consisting of % prefix followed by a hexadecimal number.

An example of an unsafe ASCII character is the space character which needs to be encoded into + or %20.

IRI (Internationalized Resource Identifier)

IRI (Internationalized Resource Identifier)

IRI (Internationalized Resource Identifier) is an URL which for internationalization purposes can consist of Unicode characters (as opposed to standard URL safe ASCII characters).

Most modern browsers support IRIs.

For the purpose of Internet transmission, IRI non-ASCII characters are converted into the so-called Punycode which consists only of ASCII characters.

An example of a Unicode non-ASCII character is 人 which needs to be converted into xn--gmq.

A domain name in IRI with internationalized characters is known as an Internationalized Domain Name (IDN).