What is URL?
URL (Uniform Resource Locator) - colloquially known as the web address - is a unique sequence of characters that provides means for locating digital information resources (digitally preserved intangible creations of the human intellect) over a network, including, but not limited to, the Internet, which then can be retrieved using various transfer protocols such as HTTP (Hypertext Transfer Protocol) for hypertext documents, FTP (File Transfer Protocol) for files, and mailto for emails.
URL as URI (Uniform Resource Identifier)
URL is a kind of URI (Uniform Resource Identifier).
URI (Uniform Resource Identifier) is a unique sequence of characters that provides means for identifying physical resources, such as physical objects, places, and people, and logical resources (intangible creations of the human intellect), such as ideas, concepts, written documents, books, songs, movies, games, and digital information resources.
URL Origins
The term URL was defined by the inventor of the World Wide Web, Tim Berners-Lee, in 1994.
The conceptualization of URL is interlinked with the development of the Hypertext Transfer Protocol (HTTP) initiated by Tim Berners-Lee at CERN in 1989. See What is HTTP? for further reading.
URI Syntax
As noted above, URL is a kind of URI.
The generic URI syntax is comprised of five components:
-
scheme (required),
-
authority (optional),
-
path (required),
-
query (optional), and
-
fragment (optional).
The authority component is comprised of three subcomponents:
-
userinfo (optional),
-
host (required), and,
-
port (optional).
The userinfo subcomponent can be further divided into the following subcomponents:
-
username (required), and
-
password (optional).
It is worth remembering that the only required URI components are scheme and path.
URL Syntax
URL observes the generic URI syntax however the scheme
component is referred to as the protocol. The protocol component can include such values as file
, http
, https
, ftp
, and mailto
.
Therefore, an URL can be comprised of five components:
-
protocol (required),
-
authority (optional) consisting of optional userinfo, required host and optional port,
-
path (required),
-
query (optional), and
-
fragment (optional).
Of those five components, only protocol
and path
are required.
An URL referring to a local file is an example of the URL with only protocol and path specified.

URL Protocol
URL protocol - the counterpart of URI scheme - is the first of the two required URL components (the second being the path) which denotes a set of rules and procedures governing the manner in which the URL resource should be accessed.
A common protocol value can be file
, http
(together with its encrypted counterpart https
), ftp
and mailto
.
URL Path
URL path is the second of the two required URL components (the first one being the protocol) which denotes the logical location to the contemplated URL resource (whereas the protocol
denotes the rules and procedures regarding the manner of resource access).
Path can consist of one segment (e.g. /movies
) or many segments (e.g. /movies/comedies
).
URL Query
URL query is an optional URL component which denotes parameters that are to be used during the URL resource access.
In an URL the query component is preceded with the question mark (?
).
The query syntax is not clearly defined but usually is a sequence of key-value pairs separated by the ampersand (&
) delimiter.
URL Fragment
URL fragment is an optional URL component which denotes the logical location to the URL secondary resource such as an element in HTML document specified by the ID attribute.
In an URL the fragment subcomponent is preceded by the hash symbol (#
).
URLs in HTTP Requests
URLs are used in HTTP requests to indicate resources being accessed.
Resources that can be accessed using HTTP and URLs are primarily HTML documents.
URLs in Hyperlinks
URLs can be used in hyperlinks.
Hyperlink (aka link) is a user-followable reference to a digital resource.
For example, a hyperlink in HTML is built using the anchor
element, and its href
(stands for hypertext reference) attribute's value can be specified as URL.
<a
href="https://soundof.it/http/tutorial/what-is-http"
>
What is HTTP?
</a>
A href
attribute's value can also be specified as a relative URL.
Relative URL is an URL in which protocol and authority components are not indicated explicitly but implicitly through substitution with the relevant values from the currently accessed resource's URL.
<a
href="/http/tutorial/what-is-http"
>
What is HTTP?
</a>
URL Converting & Encoding
A given URL can only be for data transmission over the Internet when it consists of URL safe ASCII characters.
A non-ASCII character to be used within an URL needs to be converted to the so-called Punycode consisting only of ASCII safe characters.
An example of a non-ASCII character is ♥
which - to be used within an URL - needs to be converted into xn--g6h
.
An unsafe ASCII character to be used within an URL needs to be encoded to a set of safe ASCII characters consisting of %
prefix followed by a hexadecimal number.
An example of an unsafe ASCII character is the space
character which needs to be encoded into +
or %20
.
IRI (Internationalized Resource Identifier)
IRI (Internationalized Resource Identifier) is an URL which for internationalization purposes can consist of Unicode characters (as opposed to standard URL safe ASCII characters).
Most modern browsers support IRIs.
For the purpose of Internet transmission, IRI non-ASCII characters are converted into the so-called Punycode which consists only of ASCII characters.
An example of a Unicode non-ASCII character is 人
which needs to be converted into xn--gmq
.
A domain name in IRI with internationalized characters is known as an Internationalized Domain Name (IDN).