Chapter 32. URIs

Index

Using international domain names

A x::uriimpl is a class that represents a URI as defined in RFC 3986.

#include <x/uriimpl.H>

x::uriimpl u("http://uid:pw@host/path?query#fragment");

std::cout << "scheme: " << u.get_scheme() << std::endl
	  << "authority: " << (u.getAuthority() ?
			       u.getAuthority().to_string()
			       : std::string("(null)")) << std::endl
	  << "path: " << u.get_path() << std::endl
	  << "query: " << u.get_query() << std::endl
	  << "fragment: " << u.get_fragment() << std::endl;
      

A x::uriimpl may also be constructed from an input sequence defined by a beginning and an ending iterator.

x::uriimpl u(str.begin(), str.end());

The constructors throw an exception if the passed string cannot be parsed as an URI. get_scheme(), get_authority(), get_path(), get_query(), and get_fragment() retrieve the corresponding parts of the URI.

get_authority() returns a reference to a x::uriimpl::authority_t, a class that's convertible to a bool indicating whether the URI includes an authority part. Other get methods return a std::string which will be empty if the URI did not have the corresponding part.

x::uriimpl::authority_t has three fields:

userinfo

The userinfo portion of the authority.

has_userinfo

This bool is true if the authority specifies a userinfo part.

An empty string in userinfo does not necessarily indicate that the authority did not have a userinfo part. The strict syntax allows an empty userinfo to be specified. If has_userinfo is true, and userinfo is an empty string, the authority had a @ character with nothing to its left.

hostport

The host portion of the authority, with an optional :port suffix.

set_scheme(), set_authority(), set_path(), set_query(), and set_fragment() replace the corresponding part of the URI. Their std::string parameter specifies the new value (including set_authority()). An exception gets thrown if the passed string contains characters that are not allowed in the URI part.

A x::uriimpl may hold an absolute or a relative URI. The += or + operation combines two URIs together.

x::uriimpl absuri("http://example.com/cgi-bin/printenv.cgi");

absuri += x::uriimpl("../images");

std::string str;

absuri.to_string(std::back_insert_iterator<std::string>(str));
      

to_string() formats the URI as a string, and writes it to the given output iterator.

x::uriimpl defines all comparison operators, as such this class may be used as a key in an associative container. As specified by the RFC, the URI scheme and the host component of the authority is case insensitive. This comparison operation has no knowledge of scheme-specific semantics, so all other parts of a URI are considered case sensitive.

#include <x/uriimpl.H>
#include <x/http/form.H>

x::uriimpl u("http://host/path?parameter=value");

auto form=u.get_form();

for (const auto &param: *form)
{
    std::cout << param.first << "=" << param.second << std::endl;
}

get_form() invokes get_query() and returns a x::http::form::parameters.

Using international domain names

#include <x/uriimpl.H>
#include <x/locale.H>

x::uriimpl u("http://привет.example.com/path", x::locale::base::utf8());

std::cout << u.get_host_port().first << std::endl;

URIs that use international domain names get constructed with a second parameter to x::uriimpl's constructor. The second parameter specifies the locale whose character set encodes the international domain name. There's an optional third parameter that specifies LibIDN conversion flags: IDNA_ALLOW_UNASSIGNED, and IDNA_USE_STD3_ASCII_RULES.

The international domain name is stored as its ASCII-compatible encoding, so the above example produces xn--b1agh1afp.domain.com on standard output.

std::string uri_utf8=u.to_stringi18n(x::locale::base::utf8());

to_stringi18n() returns the URI as a string. An international domain name in the URI gets converted from its ASCII-compatible encoded representation to the character set specified by the locale parameter.

#include <x/idn.H>

std::string i18n;
std::string str=x::idn::to_ascii(i18n, x::locale::base::environment());

i18n=x::idn::from_ascii(str, x::locale::base::environment());

idn.H defines low level functions for converting strings to or from ASCII-compatible encoding that's used with international domain names. Overloaded to_ascii() methods convert international domain names encoded in the locale's codeset to ASCII-compatible encoding method, and from_ascii() does the reverse.