What is URL?
URL as URI (Uniform Resource Identifier)
URL is a kind of URI (Uniform Resource Identifier).
URL Origins
The term URL was defined by the inventor of the World Wide Web, Tim Berners-Lee, in 1994.
The conceptualization of URL is interlinked with the development of the Hypertext Transfer Protocol (HTTP) initiated by Tim Berners-Lee at CERN in 1989. See What is HTTP? for further reading.
URI Syntax
As noted above, URL is a kind of URI.
The generic URI syntax is comprised of five components:
-
scheme (required),
-
authority (optional),
-
path (required),
-
query (optional), and
-
fragment (optional).
The authority component is comprised of three subcomponents:
-
userinfo (optional),
-
host (required), and,
-
port (optional).
The userinfo subcomponent can be further divided into the following subcomponents:
-
username (required), and
-
password (optional).
It is worth remembering that the only required URI components are scheme and path.
URL Syntax
URL observes the generic URI syntax however the scheme
component is referred to as the protocol. The protocol component can include such values as file
, http
, https
, ftp
, and mailto
.
Therefore, an URL can be comprised of five components:
-
protocol (required),
-
authority (optional) consisting of optional userinfo, required host and optional port,
-
path (required),
-
query (optional), and
-
fragment (optional).
Of those five components, only protocol
and path
are required.
An URL referring to a local file is an example of the URL with only protocol and path specified.
URL Protocol
A common protocol value can be file
, http
(together with its encrypted counterpart https
), ftp
and mailto
.
URL Path
Path can consist of one segment (e.g. /movies
) or many segments (e.g. /movies/comedies
).
URL Query
In an URL the query component is preceded with the question mark (?
).
The query syntax is not clearly defined but usually is a sequence of key-value pairs separated by the ampersand (&
) delimiter.
URL Fragment
In an URL the fragment subcomponent is preceded by the hash symbol (#
).
URLs in HTTP Requests
URLs are used in HTTP requests to indicate resources being accessed.
Resources that can be accessed using HTTP and URLs are primarily HTML documents.
URLs in Hyperlinks
URLs can be used in hyperlinks.
For example, a hyperlink in HTML is built using the anchor
element, and its href
(stands for hypertext reference) attribute's value can be specified as URL.
<a
href="https://soundof.it/http/tutorial/what-is-http"
>
What is HTTP?
</a>
A href
attribute's value can also be specified as a relative URL.
<a
href="/http/tutorial/what-is-http"
>
What is HTTP?
</a>
URL Converting & Encoding
A given URL can only be for data transmission over the Internet when it consists of URL safe ASCII characters.
A non-ASCII character to be used within an URL needs to be converted to the so-called Punycode consisting only of ASCII safe characters.
An example of a non-ASCII character is ♥
which - to be used within an URL - needs to be converted into xn--g6h
.
An unsafe ASCII character to be used within an URL needs to be encoded to a set of safe ASCII characters consisting of %
prefix followed by a hexadecimal number.
An example of an unsafe ASCII character is the space
character which needs to be encoded into +
or %20
.
IRI (Internationalized Resource Identifier)
Most modern browsers support IRIs.
For the purpose of Internet transmission, IRI non-ASCII characters are converted into the so-called Punycode which consists only of ASCII characters.
An example of a Unicode non-ASCII character is 人
which needs to be converted into xn--gmq
.
A domain name in IRI with internationalized characters is known as an Internationalized Domain Name (IDN).