1 /****************************************************************************
3 ** Copyright (C) 2012 Nokia Corporation and/or its subsidiary(-ies).
4 ** Copyright (C) 2012 Intel Corporation.
5 ** All rights reserved.
6 ** Contact: http://www.qt-project.org/
8 ** This file is part of the QtCore module of the Qt Toolkit.
10 ** $QT_BEGIN_LICENSE:LGPL$
11 ** GNU Lesser General Public License Usage
12 ** This file may be used under the terms of the GNU Lesser General Public
13 ** License version 2.1 as published by the Free Software Foundation and
14 ** appearing in the file LICENSE.LGPL included in the packaging of this
15 ** file. Please review the following information to ensure the GNU Lesser
16 ** General Public License version 2.1 requirements will be met:
17 ** http://www.gnu.org/licenses/old-licenses/lgpl-2.1.html.
19 ** In addition, as a special exception, Nokia gives you certain additional
20 ** rights. These rights are described in the Nokia Qt LGPL Exception
21 ** version 1.1, included in the file LGPL_EXCEPTION.txt in this package.
23 ** GNU General Public License Usage
24 ** Alternatively, this file may be used under the terms of the GNU General
25 ** Public License version 3.0 as published by the Free Software Foundation
26 ** and appearing in the file LICENSE.GPL included in the packaging of this
27 ** file. Please review the following information to ensure the GNU General
28 ** Public License version 3.0 requirements will be met:
29 ** http://www.gnu.org/copyleft/gpl.html.
32 ** Alternatively, this file may be used in accordance with the terms and
33 ** conditions contained in a signed written agreement between you and Nokia.
41 ****************************************************************************/
46 \brief The QUrl class provides a convenient interface for working
55 It can parse and construct URLs in both encoded and unencoded
56 form. QUrl also has support for internationalized domain names
59 The most common way to use QUrl is to initialize it via the
60 constructor by passing a QString. Otherwise, setUrl() and
61 setEncodedUrl() can also be used.
63 URLs can be represented in two forms: encoded or unencoded. The
64 unencoded representation is suitable for showing to users, but
65 the encoded representation is typically what you would send to
66 a web server. For example, the unencoded URL
67 "http://b\uuml\c{}hler.example.com" would be sent to the server as
68 "http://xn--bhler-kva.example.com/List%20of%20applicants.xml".
70 A URL can also be constructed piece by piece by calling
71 setScheme(), setUserName(), setPassword(), setHost(), setPort(),
72 setPath(), setEncodedQuery() and setFragment(). Some convenience
73 functions are also available: setAuthority() sets the user name,
74 password, host and port. setUserInfo() sets the user name and
77 Call isValid() to check if the URL is valid. This can be done at
78 any point during the constructing of a URL.
80 Constructing a query is particularly convenient through the use
81 of setQueryItems(), addQueryItem() and removeQueryItem(). Use
82 setQueryDelimiters() to customize the delimiters used for
83 generating the query string.
85 For the convenience of generating encoded URL strings or query
86 strings, there are two static functions called
87 fromPercentEncoding() and toPercentEncoding() which deal with
88 percent encoding and decoding of QStrings.
90 Calling isRelative() will tell whether or not the URL is
91 relative. A relative URL can be resolved by passing it as argument
92 to resolved(), which returns an absolute URL. isParentOf() is used
93 for determining whether one URL is a parent of another.
95 fromLocalFile() constructs a QUrl by parsing a local
96 file path. toLocalFile() converts a URL to a local file path.
98 The human readable representation of the URL is fetched with
99 toString(). This representation is appropriate for displaying a
100 URL to a user in unencoded form. The encoded form however, as
101 returned by toEncoded(), is for internal use, passing to web
102 servers, mail clients and so on.
104 QUrl conforms to the URI specification from
105 \l{RFC 3986} (Uniform Resource Identifier: Generic Syntax), and includes
106 scheme extensions from \l{RFC 1738} (Uniform Resource Locators). Case
107 folding rules in QUrl conform to \l{RFC 3491} (Nameprep: A Stringprep
108 Profile for Internationalized Domain Names (IDN)).
110 \section2 Character Conversions
112 Follow these rules to avoid erroneous character conversion when
113 dealing with URLs and strings:
116 \li When creating an QString to contain a URL from a QByteArray or a
117 char*, always use QString::fromUtf8().
118 \o Favor the use of QUrl::fromEncoded() and QUrl::toEncoded() instead of
119 QUrl(string) and QUrl::toString() when converting a QUrl to or from
127 \enum QUrl::ParsingMode
129 The parsing mode controls the way QUrl parses strings.
131 \value TolerantMode QUrl will try to correct some common errors in URLs.
132 This mode is useful when processing URLs entered by
135 \value StrictMode Only valid URLs are accepted. This mode is useful for
136 general URL validation.
138 In TolerantMode, the parser corrects the following invalid input:
142 \li Spaces and "%20": If an encoded URL contains a space, this will be
143 replaced with "%20". If a decoded URL contains "%20", this will be
144 replaced with a single space before the URL is parsed.
146 \li Single "%" characters: Any occurrences of a percent character "%" not
147 followed by exactly two hexadecimal characters (e.g., "13% coverage.html")
148 will be replaced by "%25".
150 \li Reserved and unreserved characters: An encoded URL should only
151 contain a few characters as literals; all other characters should
152 be percent-encoded. In TolerantMode, these characters will be
153 automatically percent-encoded where they are not allowed:
154 space / double-quote / "<" / ">" / "[" / "\" /
155 "]" / "^" / "`" / "{" / "|" / "}"
161 \enum QUrl::FormattingOption
163 The formatting options define how the URL is formatted when written out
166 \value None The format of the URL is unchanged.
167 \value RemoveScheme The scheme is removed from the URL.
168 \value RemovePassword Any password in the URL is removed.
169 \value RemoveUserInfo Any user information in the URL is removed.
170 \value RemovePort Any specified port is removed from the URL.
171 \value RemoveAuthority
172 \value RemovePath The URL's path is removed, leaving only the scheme,
173 host address, and port (if present).
174 \value RemoveQuery The query part of the URL (following a '?' character)
176 \value RemoveFragment
177 \value PreferLocalFile If the URL is a local file according to isLocalFile()
178 and contains no query or fragment, a local file path is returned.
179 \value StripTrailingSlash The trailing slash is removed if one is present.
181 Note that the case folding rules in \l{RFC 3491}{Nameprep}, which QUrl
182 conforms to, require host names to always be converted to lower case,
183 regardless of the Qt::FormattingOptions used.
187 \fn uint qHash(const QUrl &url)
191 Computes a hash key from the normalized version of \a url.
195 #include "qplatformdefs.h"
197 #include "qstringlist.h"
199 #include "qdir.h" // for QDir::fromNativeSeparators
200 #include "qtldurl_p.h"
201 #include "private/qipaddress_p.h"
202 #include "qurlquery.h"
203 #if defined(Q_OS_WINCE_WM)
204 #pragma optimize("g", off)
209 inline static bool isHex(char c)
212 return (c >= '0' && c <= '9') || (c >= 'a' && c <= 'f');
215 static inline char toHex(quint8 c)
217 return c > 9 ? c - 10 + 'A' : c + '0';
220 static inline QString ftpScheme()
222 return QStringLiteral("ftp");
225 static inline QString httpScheme()
227 return QStringLiteral("http");
230 static inline QString fileScheme()
232 return QStringLiteral("file");
235 QUrlPrivate::QUrlPrivate()
237 sectionIsPresent(0), sectionHasError(0)
241 QUrlPrivate::QUrlPrivate(const QUrlPrivate ©)
242 : ref(1), port(copy.port),
244 userName(copy.userName),
245 password(copy.password),
249 fragment(copy.fragment),
250 sectionIsPresent(copy.sectionIsPresent),
251 sectionHasError(copy.sectionHasError)
255 void QUrlPrivate::clear()
266 sectionIsPresent = 0;
271 // From RFC 3896, Appendix A Collected ABNF for URI
272 // URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
274 // scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
276 // authority = [ userinfo "@" ] host [ ":" port ]
277 // userinfo = *( unreserved / pct-encoded / sub-delims / ":" )
278 // host = IP-literal / IPv4address / reg-name
281 // reg-name = *( unreserved / pct-encoded / sub-delims )
283 // pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
285 // query = *( pchar / "/" / "?" )
287 // fragment = *( pchar / "/" / "?" )
289 // pct-encoded = "%" HEXDIG HEXDIG
291 // unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
292 // reserved = gen-delims / sub-delims
293 // gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
294 // sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
295 // / "*" / "+" / "," / ";" / "="
296 // the path component has a complex ABNF that basically boils down to
297 // slash-separated segments of "pchar"
299 // The above is the strict definition of the URL components and it is what we
300 // return encoded as FullyEncoded. However, we store the equivalent to
301 // PrettyDecoded internally, as that is the default formatting mode and most
302 // likely to be used. PrettyDecoded decodes spaces, unicode sequences and
303 // unambiguous delimiters.
305 // An ambiguous delimiter is a delimiter that, if appeared decoded, would be
306 // interpreted as the beginning of a new component. From last to first
307 // component, they are:
308 // - fragment: none, since it's the last.
309 // - query: the "#" character is ambiguous, as it starts the fragment. In
310 // addition, the "+" character is treated specially, as should be both
311 // intra-query delimiters. Since we don't know which ones they are, we
312 // keep all reserved characters untouched.
313 // - path: the "#" and "?" characters are ambigous. In addition to them,
314 // the slash itself is considered special.
315 // - host: completely special, see setHost() below.
316 // - password: the "#", "?", "/", and ":" characters are ambiguous
317 // - username: the "#", "?", "/", ":", and "@" characters are ambiguous
318 // - scheme: doesn't accept any delimiter, see setScheme() below.
320 // list the recoding table modifications to be used with the recodeFromUser
321 // function, according to the rules above
323 #define decode(x) ushort(x)
324 #define leave(x) ushort(0x100 | (x))
325 #define encode(x) ushort(0x200 | (x))
327 static const ushort encodedUserNameActions[] = {
328 // first field, everything must be encoded, including the ":"
329 // userinfo = *( unreserved / pct-encoded / sub-delims / ":" )
339 static const ushort * const prettyUserNameActions = encodedUserNameActions;
340 static const ushort * const decodedUserNameActions = 0;
342 static const ushort encodedPasswordActions[] = {
343 // same as encodedUserNameActions, but decode ":"
344 // userinfo = *( unreserved / pct-encoded / sub-delims / ":" )
354 static const ushort * const prettyPasswordActions = encodedPasswordActions;
355 static const ushort * const decodedPasswordActions = 0;
357 static const ushort encodedPathActions[] = {
358 // pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
368 static const ushort * const prettyPathActions = encodedPathActions + 2; // allow decoding "[" / "]"
369 static const ushort * const decodedPathActions = encodedPathActions + 4; // equivalent to leave('/')
371 static const ushort encodedFragmentActions[] = {
372 // fragment = *( pchar / "/" / "?" )
373 // gen-delims permitted: ":" / "@" / "/" / "?"
374 // -> must encode: "[" / "]" / "#"
375 // HOWEVER: we allow "#" to remain decoded
385 static const ushort * const prettyFragmentActions = 0;
386 static const ushort * const decodedFragmentActions = 0;
388 // the query is handled specially, since we prefer not to transform the delims
389 static const ushort * const encodedQueryActions = encodedFragmentActions + 4; // encode "#" / "[" / "]"
392 static inline QString
393 recode(const QString &input, const ushort *actions, QUrl::ComponentFormattingOptions encoding,
397 const QChar *begin = input.constData() + from;
398 const QChar *end = input.constData() + iend;
399 if (qt_urlRecode(output, begin, end, encoding, actions))
402 return input.mid(from, iend - from);
405 static inline QString
406 recodeFromUser(const QString &input, const ushort *actions, int from, int end)
408 return recode(input, actions,
409 QUrl::DecodeUnicode | QUrl::DecodeAllDelimiters | QUrl::DecodeSpaces,
413 void QUrlPrivate::appendAuthority(QString &appendTo, QUrl::FormattingOptions options) const
415 if ((options & QUrl::RemoveUserInfo) != QUrl::RemoveUserInfo) {
416 appendUserInfo(appendTo, options);
418 appendTo += QLatin1Char('@');
420 appendHost(appendTo, options);
421 if (!(options & QUrl::RemovePort) && port != -1)
422 appendTo += QLatin1Char(':') + QString::number(port);
425 void QUrlPrivate::appendUserInfo(QString &appendTo, QUrl::FormattingOptions options) const
427 // when constructing the authority or user-info, we never encode the ambiguous delimiters
428 options &= ~(QUrl::DecodeAllDelimiters & ~QUrl::DecodeUnambiguousDelimiters);
430 appendUserName(appendTo, options);
431 if (options & QUrl::RemovePassword || !hasPassword()) {
434 appendTo += QLatin1Char(':');
435 appendPassword(appendTo, options);
439 // appendXXXX functions:
440 // the internal value is already encoded in PrettyDecoded, so that case is easy.
441 // DecodeUnicode and DecodeSpaces are handled by qt_urlRecode.
442 // That leaves these functions to handle three cases related to delimiters:
443 // 1) encoded encodedXXXX tables
444 // 2) DecodeUnambiguousDelimiters prettyXXXX tables
445 // 3) DecodeAllDelimiters decodedXXXX tables
446 static inline void appendToUser(QString &appendTo, const QString &value, QUrl::FormattingOptions options,
447 const ushort *encodedActions, const ushort *prettyActions, const ushort *decodedActions)
449 if (options == QUrl::PrettyDecoded) {
454 const ushort *actions = 0;
455 if ((options & QUrl::DecodeAllDelimiters) == QUrl::DecodeUnambiguousDelimiters) {
456 actions = prettyActions;
457 } else if (options & QUrl::DecodeAllDelimiters) {
458 actions = decodedActions;
459 } else if ((options & QUrl::DecodeAllDelimiters) == 0) {
460 actions = encodedActions;
463 if (!qt_urlRecode(appendTo, value.constData(), value.constData() + value.length(),
468 inline void QUrlPrivate::appendUserName(QString &appendTo, QUrl::FormattingOptions options) const
470 appendToUser(appendTo, userName, options, encodedUserNameActions, prettyUserNameActions, decodedUserNameActions);
473 inline void QUrlPrivate::appendPassword(QString &appendTo, QUrl::FormattingOptions options) const
475 appendToUser(appendTo, password, options, encodedPasswordActions, prettyPasswordActions, decodedPasswordActions);
478 inline void QUrlPrivate::appendPath(QString &appendTo, QUrl::FormattingOptions options) const
480 appendToUser(appendTo, path, options, encodedPathActions, prettyPathActions, decodedPathActions);
483 inline void QUrlPrivate::appendFragment(QString &appendTo, QUrl::FormattingOptions options) const
485 appendToUser(appendTo, fragment, options, encodedFragmentActions, prettyFragmentActions, decodedFragmentActions);
488 inline void QUrlPrivate::appendQuery(QString &appendTo, QUrl::FormattingOptions options) const
490 // almost the same code as the previous functions
491 // except we prefer not to touch the delimiters
492 if (options == QUrl::PrettyDecoded) {
497 const ushort *actions = 0;
498 if ((options & QUrl::DecodeAllDelimiters) == QUrl::DecodeUnambiguousDelimiters) {
499 // reset to default qt_urlRecode behaviour (leave delimiters alone)
500 options &= ~QUrl::DecodeAllDelimiters;
501 } else if ((options & QUrl::DecodeAllDelimiters) == 0) {
502 actions = encodedQueryActions;
505 if (!qt_urlRecode(appendTo, query.constData(), query.constData() + query.length(),
512 bool QUrlPrivate::setScheme(const QString &value, int len, bool decoded)
514 // schemes are strictly RFC-compliant:
515 // scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
516 // but we need to decode any percent-encoding sequences that fall on
518 // we also lowercase the scheme
521 sectionIsPresent |= Scheme;
522 sectionHasError |= Scheme; // assume it has errors, we'll clear before returning true
527 int needsLowercasing = -1;
528 const ushort *p = reinterpret_cast<const ushort *>(value.constData());
529 for (int i = 0; i < len; ++i) {
530 if (p[i] >= 'a' && p[i] <= 'z')
532 if (p[i] >= 'A' && p[i] <= 'Z') {
533 needsLowercasing = i;
536 if (p[i] >= '0' && p[i] <= '9' && i > 0)
538 if (p[i] == '+' || p[i] == '-' || p[i] == '.')
542 // found a percent-encoded sign
543 // if we haven't decoded yet, decode and try again
547 QString decodedScheme;
548 if (qt_urlRecode(decodedScheme, value.constData(), value.constData() + len, 0, 0) == 0)
550 return setScheme(decodedScheme, decodedScheme.length(), true);
553 // found something else
557 scheme = value.left(len);
558 sectionHasError &= ~Scheme;
560 if (needsLowercasing != -1) {
561 // schemes are ASCII only, so we don't need the full Unicode toLower
562 QChar *schemeData = scheme.data(); // force detaching here
563 for (int i = needsLowercasing; i >= 0; --i) {
564 register ushort c = schemeData[i].unicode();
565 if (c >= 'A' && c <= 'Z')
566 schemeData[i] = c + 0x20;
572 bool QUrlPrivate::setAuthority(const QString &auth, int from, int end)
574 sectionHasError &= ~Authority;
575 sectionIsPresent &= ~Authority;
576 sectionIsPresent |= Host;
585 int userInfoIndex = auth.indexOf(QLatin1Char('@'), from);
586 if (uint(userInfoIndex) < uint(end)) {
587 setUserInfo(auth, from, userInfoIndex);
588 from = userInfoIndex + 1;
591 if (userInfoIndex == end - 1) {
592 // authority without a hostname is invalid
596 int colonIndex = auth.lastIndexOf(QLatin1Char(':'), end - 1);
597 if (colonIndex < from)
600 if (uint(colonIndex) < uint(end)) {
601 if (auth.at(from).unicode() == '[') {
602 // check if colonIndex isn't inside the "[...]" part
603 int closingBracket = auth.indexOf(QLatin1Char(']'), from);
604 if (uint(closingBracket) > uint(colonIndex))
609 if (colonIndex == end - 1) {
610 // found a colon but no digits after it
611 sectionHasError |= Port;
612 } else if (uint(colonIndex) < uint(end)) {
614 for (int i = colonIndex + 1; i < end; ++i) {
615 ushort c = auth.at(i).unicode();
616 if (c >= '0' && c <= '9') {
620 sectionHasError |= Port;
621 x = ulong(-1); // x != ushort(x)
631 return setHost(auth, from, qMin<uint>(end, colonIndex)) && !(sectionHasError & Port);
634 void QUrlPrivate::setUserInfo(const QString &userInfo, int from, int end)
636 int delimIndex = userInfo.indexOf(QLatin1Char(':'), from);
637 setUserName(userInfo, from, qMin<uint>(delimIndex, end));
639 if (delimIndex == -1) {
641 sectionIsPresent &= ~Password;
642 sectionHasError &= ~Password;
644 setPassword(userInfo, delimIndex + 1, end);
648 inline void QUrlPrivate::setUserName(const QString &value, int from, int end)
650 sectionIsPresent |= UserName;
651 sectionHasError &= ~UserName;
652 userName = recodeFromUser(value, prettyUserNameActions, from, end);
655 inline void QUrlPrivate::setPassword(const QString &value, int from, int end)
657 sectionIsPresent |= Password;
658 sectionHasError &= ~Password;
659 password = recodeFromUser(value, prettyPasswordActions, from, end);
662 inline void QUrlPrivate::setPath(const QString &value, int from, int end)
664 // sectionIsPresent |= Path; // not used, save some cycles
665 sectionHasError &= ~Path;
666 path = recodeFromUser(value, prettyPathActions, from, end);
669 // check for the "path-noscheme" case
670 // if the path contains a ":" before the first "/", it could be misinterpreted
674 inline void QUrlPrivate::setFragment(const QString &value, int from, int end)
676 sectionIsPresent |= Fragment;
677 sectionHasError &= ~Fragment;
678 fragment = recodeFromUser(value, prettyFragmentActions, from, end);
681 inline void QUrlPrivate::setQuery(const QString &value, int from, int iend)
683 sectionIsPresent |= Query;
684 sectionHasError &= ~Query;
686 // use the default actions for the query
687 static const ushort decodeActions[] = {
701 const QChar *begin = value.constData() + from;
702 const QChar *end = value.constData() + iend;
703 if (qt_urlRecode(output, begin, end, QUrl::DecodeUnicode | QUrl::DecodeSpaces,
707 query = value.mid(from, iend - from);
711 // The RFC says the host is:
712 // host = IP-literal / IPv4address / reg-name
713 // IP-literal = "[" ( IPv6address / IPvFuture ) "]"
714 // IPvFuture = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )
715 // [a strict definition of IPv6Address and IPv4Address]
716 // reg-name = *( unreserved / pct-encoded / sub-delims )
718 // We deviate from the standard in all but IPvFuture. For IPvFuture we accept
719 // and store only exactly what the RFC says we should. No percent-encoding is
720 // permitted in this field, so Unicode characters and space aren't either.
722 // For IPv4 addresses, we accept broken addresses like inet_aton does (that is,
723 // less than three dots). However, we correct the address to the proper form
724 // and store the corrected address. After correction, we comply to the RFC and
725 // it's exclusively composed of unreserved characters.
727 // For IPv6 addresses, we accept addresses including trailing (embedded) IPv4
728 // addresses, the so-called v4-compat and v4-mapped addresses. We also store
729 // those addresses like that in the hostname field, which violates the spec.
730 // IPv6 hosts are stored with the square brackets in the QString. It also
731 // requires no transformation in any way.
733 // As for registered names, it's the other way around: we accept only valid
734 // hostnames as specified by STD 3 and IDNA. That means everything we accept is
735 // valid in the RFC definition above, but there are many valid reg-names
736 // according to the RFC that we do not accept in the name of security. Since we
737 // do accept IDNA, reg-names are subject to ACE encoding and decoding, which is
738 // specified by the DecodeUnicode flag. The hostname is stored in its Unicode form.
740 inline void QUrlPrivate::appendHost(QString &appendTo, QUrl::FormattingOptions options) const
742 // this is the only flag that matters
743 options &= QUrl::DecodeUnicode;
746 if (host.at(0).unicode() == '[') {
747 // IPv6Address and IPvFuture address never require any transformation
750 // this is either an IPv4Address or a reg-name
751 // if it is a reg-name, it is already stored in Unicode form
752 if (options == QUrl::DecodeUnicode)
755 appendTo += qt_ACE_do(host, ToAceOnly);
759 // the whole IPvFuture is passed and parsed here, including brackets
760 static bool parseIpFuture(QString &host, const QChar *begin, const QChar *end)
762 // IPvFuture = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )
763 static const char acceptable[] =
764 "!$&'()*+,;=" // sub-delims
766 "-._~"; // unreserved
768 // the brackets and the "v" have been checked
769 if (begin[3].unicode() != '.')
771 if ((begin[2].unicode() >= 'A' && begin[2].unicode() >= 'F') ||
772 (begin[2].unicode() >= 'a' && begin[2].unicode() <= 'f') ||
773 (begin[2].unicode() >= '0' && begin[2].unicode() <= '9')) {
774 // this is so unlikely that we'll just go down the slow path
775 // decode the whole string, skipping the "[vH." and "]" which we already know to be there
776 host += QString::fromRawData(begin, 4);
781 if (qt_urlRecode(decoded, begin, end, QUrl::FullyEncoded, 0)) {
782 begin = decoded.constBegin();
783 end = decoded.constEnd();
786 for ( ; begin != end; ++begin) {
787 if (begin->unicode() >= 'A' && begin->unicode() <= 'Z')
789 else if (begin->unicode() >= 'a' && begin->unicode() <= 'z')
791 else if (begin->unicode() >= '0' && begin->unicode() <= '9')
793 else if (begin->unicode() < 0x80 && strchr(acceptable, begin->unicode()) != 0)
798 host += QLatin1Char(']');
804 // ONLY the IPv6 address is parsed here, WITHOUT the brackets
805 static bool parseIp6(QString &host, const QChar *begin, const QChar *end)
807 QIPAddressUtils::IPv6Address address;
808 if (!QIPAddressUtils::parseIp6(address, begin, end)) {
809 // IPv6 failed parsing, check if it was a percent-encoded character in
810 // the middle and try again
812 if (!qt_urlRecode(decoded, begin, end, QUrl::FullyEncoded, 0)) {
813 // no transformation, nothing to re-parse
818 // if the parsing fails again, the qt_urlRecode above will return 0
819 return parseIp6(host, decoded.constBegin(), decoded.constEnd());
822 host.reserve(host.size() + (end - begin));
823 host += QLatin1Char('[');
824 QIPAddressUtils::toString(host, address);
825 host += QLatin1Char(']');
829 bool QUrlPrivate::setHost(const QString &value, int from, int iend, bool maybePercentEncoded)
831 const QChar *begin = value.constData() + from;
832 const QChar *end = value.constData() + iend;
834 const int len = end - begin;
836 sectionIsPresent |= Host;
838 sectionHasError &= ~Host;
842 // we'll clear just before returning true
843 sectionHasError |= Host;
845 if (begin[0].unicode() == '[') {
846 // IPv6Address or IPvFuture
847 // smallest IPv6 address is "[::]" (len = 4)
848 // smallest IPvFuture address is "[v7.X]" (len = 6)
849 if (end[-1].unicode() != ']')
853 if (len > 5 && begin[1].unicode() == 'v')
854 ok = parseIpFuture(host, begin, end);
856 ok = parseIp6(host, begin + 1, end - 1);
859 sectionHasError &= ~Host;
863 // check if it's an IPv4 address
864 QIPAddressUtils::IPv4Address ip4;
865 if (QIPAddressUtils::parseIp4(ip4, begin, end)) {
867 QIPAddressUtils::toString(host, ip4);
868 sectionHasError &= ~Host;
872 // This is probably a reg-name.
873 // But it can also be an encoded string that, when decoded becomes one
874 // of the types above.
876 // Two types of encoding are possible:
877 // percent encoding (e.g., "%31%30%2E%30%2E%30%2E%31" -> "10.0.0.1")
878 // Unicode encoding (some non-ASCII characters case-fold to digits
879 // when nameprepping is done)
881 // The qt_ACE_do function below applies nameprepping and the STD3 check.
882 // That means a Unicode string may become an IPv4 address, but it cannot
883 // produce a '[' or a '%'.
885 // check for percent-encoding first
887 if (maybePercentEncoded && qt_urlRecode(s, begin, end, QUrl::MostDecoded, 0)) {
888 // something was decoded
889 // anything encoded left?
890 if (s.contains(QChar(0x25))) // '%'
894 return setHost(s, 0, s.length(), false);
897 s = qt_ACE_do(QString::fromRawData(begin, len), NormalizeAce);
902 if (QIPAddressUtils::parseIp4(ip4, s.constBegin(), s.constEnd())) {
903 QIPAddressUtils::toString(host, ip4);
907 sectionHasError &= ~Host;
911 void QUrlPrivate::parse(const QString &url)
913 // URI-reference = URI / relative-ref
914 // URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
915 // relative-ref = relative-part [ "?" query ] [ "#" fragment ]
916 // hier-part = "//" authority path-abempty
917 // / other path types
918 // relative-part = "//" authority path-abempty
919 // / other path types here
921 sectionIsPresent = 0;
924 // find the important delimiters
928 const int len = url.length();
929 const QChar *const begin = url.constData();
930 const ushort *const data = reinterpret_cast<const ushort *>(begin);
932 for (int i = 0; i < len; ++i) {
933 if (data[i] == '#' && hash == -1) {
936 // nothing more to be found
940 if (question == -1) {
941 if (data[i] == ':' && colon == -1)
943 else if (data[i] == '?')
948 // check if we have a scheme
950 if (colon != -1 && setScheme(url, colon)) {
951 hierStart = colon + 1;
953 // recover from a failed scheme: it might not have been a scheme at all
956 sectionIsPresent = 0;
960 int hierEnd = qMin<uint>(qMin<uint>(question, hash), len);
961 if (hierEnd - hierStart >= 2 && data[hierStart] == '/' && data[hierStart + 1] == '/') {
962 // we have an authority, it ends at the first slash after these
963 int authorityEnd = hierEnd;
964 for (int i = hierStart + 2; i < authorityEnd ; ++i) {
965 if (data[i] == '/') {
971 setAuthority(url, hierStart + 2, authorityEnd);
973 // even if we failed to set the authority properly, let's try to recover
974 setPath(url, authorityEnd, hierEnd);
981 if (hierStart < hierEnd)
982 setPath(url, hierStart, hierEnd);
987 if (uint(question) < uint(hash))
988 setQuery(url, question + 1, qMin<uint>(hash, len));
991 setFragment(url, hash + 1, len);
995 From http://www.ietf.org/rfc/rfc3986.txt, 5.2.3: Merge paths
997 Returns a merge of the current path with the relative path passed
1000 Note: \a relativePath is relative (does not start with '/').
1002 QString QUrlPrivate::mergePaths(const QString &relativePath) const
1004 // If the base URI has a defined authority component and an empty
1005 // path, then return a string consisting of "/" concatenated with
1006 // the reference's path; otherwise,
1007 if (!host.isEmpty() && path.isEmpty())
1008 return QLatin1Char('/') + relativePath;
1010 // Return a string consisting of the reference's path component
1011 // appended to all but the last segment of the base URI's path
1012 // (i.e., excluding any characters after the right-most "/" in the
1013 // base URI path, or excluding the entire base URI path if it does
1014 // not contain any "/" characters).
1016 if (!path.contains(QLatin1Char('/')))
1017 newPath = relativePath;
1019 newPath = path.leftRef(path.lastIndexOf(QLatin1Char('/')) + 1) + relativePath;
1025 From http://www.ietf.org/rfc/rfc3986.txt, 5.2.4: Remove dot segments
1027 Removes unnecessary ../ and ./ from the path. Used for normalizing
1030 static void removeDotsFromPath(QString *path)
1032 // The input buffer is initialized with the now-appended path
1033 // components and the output buffer is initialized to the empty
1035 QChar *out = path->data();
1036 const QChar *in = out;
1037 const QChar *end = out + path->size();
1039 // If the input buffer consists only of
1040 // "." or "..", then remove that from the input
1042 if (path->size() == 1 && in[0].unicode() == '.')
1044 else if (path->size() == 2 && in[0].unicode() == '.' && in[1].unicode() == '.')
1046 // While the input buffer is not empty, loop:
1049 // otherwise, if the input buffer begins with a prefix of "../" or "./",
1050 // then remove that prefix from the input buffer;
1051 if (path->size() >= 2 && in[0].unicode() == '.' && in[1].unicode() == '/')
1053 else if (path->size() >= 3 && in[0].unicode() == '.'
1054 && in[1].unicode() == '.' && in[2].unicode() == '/')
1057 // otherwise, if the input buffer begins with a prefix of
1058 // "/./" or "/.", where "." is a complete path segment,
1059 // then replace that prefix with "/" in the input buffer;
1060 if (in <= end - 3 && in[0].unicode() == '/' && in[1].unicode() == '.'
1061 && in[2].unicode() == '/') {
1064 } else if (in == end - 2 && in[0].unicode() == '/' && in[1].unicode() == '.') {
1065 *out++ = QLatin1Char('/');
1070 // otherwise, if the input buffer begins with a prefix
1071 // of "/../" or "/..", where ".." is a complete path
1072 // segment, then replace that prefix with "/" in the
1073 // input buffer and remove the last //segment and its
1074 // preceding "/" (if any) from the output buffer;
1075 if (in <= end - 4 && in[0].unicode() == '/' && in[1].unicode() == '.'
1076 && in[2].unicode() == '.' && in[3].unicode() == '/') {
1077 while (out > path->constData() && (--out)->unicode() != '/')
1079 if (out == path->constData() && out->unicode() != '/')
1083 } else if (in == end - 3 && in[0].unicode() == '/' && in[1].unicode() == '.'
1084 && in[2].unicode() == '.') {
1085 while (out > path->constData() && (--out)->unicode() != '/')
1087 if (out->unicode() == '/')
1093 // otherwise move the first path segment in
1094 // the input buffer to the end of the output
1095 // buffer, including the initial "/" character
1096 // (if any) and any subsequent characters up
1097 // to, but not including, the next "/"
1098 // character or the end of the input buffer.
1100 while (in < end && in->unicode() != '/')
1103 path->truncate(out - path->constData());
1107 void QUrlPrivate::validate() const
1109 QUrlPrivate *that = (QUrlPrivate *)this;
1110 that->encodedOriginal = that->toEncoded(); // may detach
1113 QURL_SETFLAG(that->stateFlags, Validated);
1118 QString auth = authority(); // causes the non-encoded forms to be valid
1120 // authority() calls canonicalHost() which sets this
1124 if (scheme == QLatin1String("mailto")) {
1125 if (!host.isEmpty() || port != -1 || !userName.isEmpty() || !password.isEmpty()) {
1126 that->isValid = false;
1127 that->errorInfo.setParams(0, QT_TRANSLATE_NOOP(QUrl, "expected empty host, username,"
1128 "port and password"),
1131 } else if (scheme == ftpScheme() || scheme == httpScheme()) {
1132 if (host.isEmpty() && !(path.isEmpty() && encodedPath.isEmpty())) {
1133 that->isValid = false;
1134 that->errorInfo.setParams(0, QT_TRANSLATE_NOOP(QUrl, "the host is empty, but not the path"),
1140 const QByteArray &QUrlPrivate::normalized() const
1142 if (QURL_HASFLAG(stateFlags, QUrlPrivate::Normalized))
1143 return encodedNormalized;
1145 QUrlPrivate *that = const_cast<QUrlPrivate *>(this);
1146 QURL_SETFLAG(that->stateFlags, QUrlPrivate::Normalized);
1148 QUrlPrivate tmp = *this;
1149 tmp.scheme = tmp.scheme.toLower();
1150 tmp.host = tmp.canonicalHost();
1152 // ensure the encoded and normalized parts of the URL
1153 tmp.ensureEncodedParts();
1154 if (tmp.encodedUserName.contains('%'))
1155 q_normalizePercentEncoding(&tmp.encodedUserName, userNameExcludeChars);
1156 if (tmp.encodedPassword.contains('%'))
1157 q_normalizePercentEncoding(&tmp.encodedPassword, passwordExcludeChars);
1158 if (tmp.encodedFragment.contains('%'))
1159 q_normalizePercentEncoding(&tmp.encodedFragment, fragmentExcludeChars);
1161 if (tmp.encodedPath.contains('%')) {
1162 // the path is a bit special:
1163 // the slashes shouldn't be encoded or decoded.
1164 // They should remain exactly like they are right now
1166 // treat the path as a slash-separated sequence of pchar
1168 result.reserve(tmp.encodedPath.length());
1169 if (tmp.encodedPath.startsWith('/'))
1172 const char *data = tmp.encodedPath.constData();
1177 nextSlash = tmp.encodedPath.indexOf('/', lastSlash);
1179 if (nextSlash == -1)
1180 len = tmp.encodedPath.length() - lastSlash;
1182 len = nextSlash - lastSlash;
1184 if (memchr(data + lastSlash, '%', len)) {
1185 // there's at least one percent before the next slash
1186 QByteArray block = QByteArray(data + lastSlash, len);
1187 q_normalizePercentEncoding(&block, pathExcludeChars);
1188 result.append(block);
1190 // no percents in this path segment, append wholesale
1191 result.append(data + lastSlash, len);
1194 // append the slash too, if it's there
1195 if (nextSlash != -1)
1198 lastSlash = nextSlash;
1199 } while (lastSlash != -1);
1201 tmp.encodedPath = result;
1204 if (!tmp.scheme.isEmpty()) // relative test
1205 removeDotsFromPath(&tmp.encodedPath);
1207 int qLen = tmp.query.length();
1208 for (int i = 0; i < qLen; i++) {
1209 if (qLen - i > 2 && tmp.query.at(i) == '%') {
1211 tmp.query[i] = qToLower(tmp.query.at(i));
1213 tmp.query[i] = qToLower(tmp.query.at(i));
1216 encodedNormalized = tmp.toEncoded();
1218 return encodedNormalized;
1221 QString QUrlPrivate::createErrorString()
1223 if (isValid && isHostValid)
1226 QString errorString(QLatin1String(QT_TRANSLATE_NOOP(QUrl, "Invalid URL \"")));
1227 errorString += QLatin1String(encodedOriginal.constData());
1228 errorString += QLatin1String(QT_TRANSLATE_NOOP(QUrl, "\""));
1230 if (errorInfo._source) {
1231 int position = encodedOriginal.indexOf(errorInfo._source) - 1;
1233 errorString += QLatin1String(QT_TRANSLATE_NOOP(QUrl, ": error at position "));
1234 errorString += QString::number(position);
1236 errorString += QLatin1String(QT_TRANSLATE_NOOP(QUrl, ": "));
1237 errorString += QLatin1String(errorInfo._source);
1241 if (errorInfo._expected) {
1242 errorString += QLatin1String(QT_TRANSLATE_NOOP(QUrl, ": expected \'"));
1243 errorString += QLatin1Char(errorInfo._expected);
1244 errorString += QLatin1String(QT_TRANSLATE_NOOP(QUrl, "\'"));
1246 errorString += QLatin1String(QT_TRANSLATE_NOOP(QUrl, ": "));
1248 errorString += QLatin1String(errorInfo._message);
1250 errorString += QLatin1String(QT_TRANSLATE_NOOP(QUrl, "invalid hostname"));
1252 if (errorInfo._found) {
1253 errorString += QLatin1String(QT_TRANSLATE_NOOP(QUrl, ", but found \'"));
1254 errorString += QLatin1Char(errorInfo._found);
1255 errorString += QLatin1String(QT_TRANSLATE_NOOP(QUrl, "\'"));
1262 \macro QT_NO_URL_CAST_FROM_STRING
1265 Disables automatic conversions from QString (or char *) to QUrl.
1267 Compiling your code with this define is useful when you have a lot of
1268 code that uses QString for file names and you wish to convert it to
1269 use QUrl for network transparency. In any code that uses QUrl, it can
1270 help avoid missing QUrl::resolved() calls, and other misuses of
1271 QString to QUrl conversions.
1274 url = filename; // probably not what you want
1276 url = QUrl::fromLocalFile(filename);
1277 url = baseurl.resolved(QUrl(filename));
1280 \sa QT_NO_CAST_FROM_ASCII
1285 Constructs a URL by parsing \a url. \a url is assumed to be in human
1286 readable representation, with no percent encoding. QUrl will automatically
1287 percent encode all characters that are not allowed in a URL.
1288 The default parsing mode is TolerantMode.
1290 Parses the \a url using the parser mode \a parsingMode.
1294 \snippet doc/src/snippets/code/src_corelib_io_qurl.cpp 0
1296 To construct a URL from an encoded string, call fromEncoded():
1298 \snippet doc/src/snippets/code/src_corelib_io_qurl.cpp 1
1300 \sa setUrl(), setEncodedUrl(), fromEncoded(), TolerantMode
1302 QUrl::QUrl(const QString &url, ParsingMode parsingMode) : d(0)
1304 setUrl(url, parsingMode);
1308 Constructs an empty QUrl object.
1315 Constructs a copy of \a other.
1317 QUrl::QUrl(const QUrl &other) : d(other.d)
1324 Destructor; called immediately before the object is deleted.
1328 if (d && !d->ref.deref())
1333 Returns true if the URL is valid; otherwise returns false.
1335 The URL is run through a conformance test. Every part of the URL
1336 must conform to the standard encoding rules of the URI standard
1337 for the URL to be reported as valid.
1339 \snippet doc/src/snippets/code/src_corelib_io_qurl.cpp 2
1341 bool QUrl::isValid() const
1343 if (!d) return true;
1344 return d->sectionHasError == 0;
1348 Returns true if the URL has no data; otherwise returns false.
1350 bool QUrl::isEmpty() const
1352 if (!d) return true;
1354 // cannot use sectionIsPresent here
1355 // we may have only empty sections present
1356 return d->scheme.isEmpty()
1357 && d->userName.isEmpty()
1358 && d->password.isEmpty()
1359 && d->host.isEmpty()
1361 && d->path.isEmpty()
1362 && d->query.isEmpty()
1363 && d->fragment.isEmpty();
1367 Resets the content of the QUrl. After calling this function, the
1368 QUrl is equal to one that has been constructed with the default
1373 if (d && !d->ref.deref())
1379 Parses \a url using the parsing mode \a parsingMode.
1381 \a url is assumed to be in unicode format, with no percent
1384 Calling isValid() will tell whether or not a valid URL was
1389 void QUrl::setUrl(const QString &url, ParsingMode parsingMode)
1392 if (parsingMode == StrictMode) {
1393 // ### strict check here!
1400 Sets the scheme of the URL to \a scheme. As a scheme can only
1401 contain ASCII characters, no conversion or encoding is done on the
1404 The scheme describes the type (or protocol) of the URL. It's
1405 represented by one or more ASCII characters at the start the URL,
1406 and is followed by a ':'. The following example shows a URL where
1407 the scheme is "ftp":
1409 \img qurl-authority2.png
1411 The scheme can also be empty, in which case the URL is interpreted
1414 \sa scheme(), isRelative()
1416 void QUrl::setScheme(const QString &scheme)
1419 if (scheme.isEmpty()) {
1420 // schemes are not allowed to be empty
1421 d->sectionIsPresent &= ~QUrlPrivate::Scheme;
1422 d->sectionHasError &= ~QUrlPrivate::Scheme;
1425 d->setScheme(scheme, scheme.length());
1430 Returns the scheme of the URL. If an empty string is returned,
1431 this means the scheme is undefined and the URL is then relative.
1433 \sa setScheme(), isRelative()
1435 QString QUrl::scheme() const
1437 if (!d) return QString();
1443 Sets the authority of the URL to \a authority.
1445 The authority of a URL is the combination of user info, a host
1446 name and a port. All of these elements are optional; an empty
1447 authority is therefore valid.
1449 The user info and host are separated by a '@', and the host and
1450 port are separated by a ':'. If the user info is empty, the '@'
1451 must be omitted; although a stray ':' is permitted if the port is
1454 The following example shows a valid authority string:
1456 \img qurl-authority.png
1458 void QUrl::setAuthority(const QString &authority)
1461 d->setAuthority(authority, 0, authority.length());
1462 if (authority.isNull()) {
1463 // QUrlPrivate::setAuthority cleared almost everything
1464 // but it leaves the Host bit set
1465 d->sectionIsPresent &= ~QUrlPrivate::Authority;
1470 Returns the authority of the URL if it is defined; otherwise
1471 an empty string is returned.
1475 QString QUrl::authority(ComponentFormattingOptions options) const
1477 if (!d) return QString();
1480 d->appendAuthority(result, options);
1485 Sets the user info of the URL to \a userInfo. The user info is an
1486 optional part of the authority of the URL, as described in
1489 The user info consists of a user name and optionally a password,
1490 separated by a ':'. If the password is empty, the colon must be
1491 omitted. The following example shows a valid user info string:
1493 \img qurl-authority3.png
1495 \sa userInfo(), setUserName(), setPassword(), setAuthority()
1497 void QUrl::setUserInfo(const QString &userInfo)
1500 QString trimmed = userInfo.trimmed();
1501 d->setUserInfo(trimmed, 0, trimmed.length());
1502 if (userInfo.isNull()) {
1503 // QUrlPrivate::setUserInfo cleared almost everything
1504 // but it leaves the UserName bit set
1505 d->sectionIsPresent &= ~QUrlPrivate::UserInfo;
1510 Returns the user info of the URL, or an empty string if the user
1513 QString QUrl::userInfo(ComponentFormattingOptions options) const
1515 if (!d) return QString();
1518 d->appendUserInfo(result, options);
1523 Sets the URL's user name to \a userName. The \a userName is part
1524 of the user info element in the authority of the URL, as described
1527 \sa setEncodedUserName(), userName(), setUserInfo()
1529 void QUrl::setUserName(const QString &userName)
1532 d->setUserName(userName, 0, userName.length());
1533 if (userName.isNull())
1534 d->sectionIsPresent &= ~QUrlPrivate::UserName;
1538 Returns the user name of the URL if it is defined; otherwise
1539 an empty string is returned.
1541 \sa setUserName(), encodedUserName()
1543 QString QUrl::userName(ComponentFormattingOptions options) const
1545 if (!d) return QString();
1548 d->appendUserName(result, options);
1553 Sets the URL's password to \a password. The \a password is part of
1554 the user info element in the authority of the URL, as described in
1557 \sa password(), setUserInfo()
1559 void QUrl::setPassword(const QString &password)
1562 d->setPassword(password, 0, password.length());
1563 if (password.isNull())
1564 d->sectionIsPresent &= ~QUrlPrivate::Password;
1568 Returns the password of the URL if it is defined; otherwise
1569 an empty string is returned.
1573 QString QUrl::password(ComponentFormattingOptions options) const
1575 if (!d) return QString();
1578 d->appendPassword(result, options);
1583 Sets the host of the URL to \a host. The host is part of the
1586 \sa host(), setAuthority()
1588 void QUrl::setHost(const QString &host)
1591 if (d->setHost(host, 0, host.length())) {
1593 d->sectionIsPresent &= ~QUrlPrivate::Host;
1595 // setHost failed, it might be IPv6 or IPvFuture in need of bracketing
1596 d->setHost(QLatin1Char('[') + host + QLatin1Char(']'), 0, host.length() + 2);
1601 Returns the host of the URL if it is defined; otherwise
1602 an empty string is returned.
1604 QString QUrl::host(ComponentFormattingOptions options) const
1606 if (!d) return QString();
1609 d->appendHost(result, options);
1610 if (result.startsWith(QLatin1Char('[')))
1611 return result.mid(1, result.length() - 2);
1616 Sets the port of the URL to \a port. The port is part of the
1617 authority of the URL, as described in setAuthority().
1619 \a port must be between 0 and 65535 inclusive. Setting the
1620 port to -1 indicates that the port is unspecified.
1622 void QUrl::setPort(int port)
1626 if (port < -1 || port > 65535) {
1627 qWarning("QUrl::setPort: Out of range");
1629 d->sectionHasError |= QUrlPrivate::Port;
1631 d->sectionHasError &= ~QUrlPrivate::Port;
1640 Returns the port of the URL, or \a defaultPort if the port is
1645 \snippet doc/src/snippets/code/src_corelib_io_qurl.cpp 3
1647 int QUrl::port(int defaultPort) const
1649 if (!d) return defaultPort;
1650 return d->port == -1 ? defaultPort : d->port;
1654 Sets the path of the URL to \a path. The path is the part of the
1655 URL that comes after the authority but before the query string.
1657 \img qurl-ftppath.png
1659 For non-hierarchical schemes, the path will be everything
1660 following the scheme declaration, as in the following example:
1662 \img qurl-mailtopath.png
1666 void QUrl::setPath(const QString &path)
1669 d->setPath(path, 0, path.length());
1671 // optimized out, since there is no path delimiter
1672 // if (path.isNull())
1673 // d->sectionIsPresent &= ~QUrlPrivate::Path;
1677 Returns the path of the URL.
1681 QString QUrl::path(ComponentFormattingOptions options) const
1683 if (!d) return QString();
1686 d->appendPath(result, options);
1693 Returns true if this URL contains a Query (i.e., if ? was seen on it).
1695 \sa hasQueryItem(), encodedQuery()
1697 bool QUrl::hasQuery() const
1699 if (!d) return false;
1700 return d->hasQuery();
1704 Sets the query string of the URL to \a query. The string is
1705 inserted as-is, and no further encoding is performed when calling
1708 This function is useful if you need to pass a query string that
1709 does not fit into the key-value pattern, or that uses a different
1710 scheme for encoding special characters than what is suggested by
1713 Passing a value of QByteArray() to \a query (a null QByteArray) unsets
1714 the query completely. However, passing a value of QByteArray("")
1715 will set the query to an empty value, as if the original URL
1718 \sa encodedQuery(), hasQuery()
1720 void QUrl::setQuery(const QString &query)
1724 d->setQuery(query, 0, query.length());
1726 d->sectionIsPresent &= ~QUrlPrivate::Query;
1729 void QUrl::setQuery(const QUrlQuery &query)
1733 // we know the data is in the right format
1734 d->query = query.toString();
1735 if (query.isEmpty())
1736 d->sectionIsPresent &= ~QUrlPrivate::Query;
1738 d->sectionIsPresent |= QUrlPrivate::Query;
1742 Returns the query string of the URL in percent encoded form.
1744 QString QUrl::query(ComponentFormattingOptions options) const
1746 if (!d) return QString();
1749 d->appendQuery(result, options);
1750 if (d->hasQuery() && result.isNull())
1756 Sets the fragment of the URL to \a fragment. The fragment is the
1757 last part of the URL, represented by a '#' followed by a string of
1758 characters. It is typically used in HTTP for referring to a
1759 certain link or point on a page:
1761 \img qurl-fragment.png
1763 The fragment is sometimes also referred to as the URL "reference".
1765 Passing an argument of QString() (a null QString) will unset the fragment.
1766 Passing an argument of QString("") (an empty but not null QString)
1767 will set the fragment to an empty string (as if the original URL
1770 \sa fragment(), hasFragment()
1772 void QUrl::setFragment(const QString &fragment)
1776 d->setFragment(fragment, 0, fragment.length());
1777 if (fragment.isNull())
1778 d->sectionIsPresent &= ~QUrlPrivate::Fragment;
1782 Returns the fragment of the URL.
1786 QString QUrl::fragment(ComponentFormattingOptions options) const
1788 if (!d) return QString();
1791 d->appendFragment(result, options);
1792 if (d->hasFragment() && result.isNull())
1800 Returns true if this URL contains a fragment (i.e., if # was seen on it).
1802 \sa fragment(), setFragment()
1804 bool QUrl::hasFragment() const
1806 if (!d) return false;
1807 return d->hasFragment();
1813 Returns the TLD (Top-Level Domain) of the URL, (e.g. .co.uk, .net).
1814 Note that the return value is prefixed with a '.' unless the
1815 URL does not contain a valid TLD, in which case the function returns
1818 QString QUrl::topLevelDomain(ComponentFormattingOptions options) const
1820 QString tld = qTopLevelDomain(host());
1821 if ((options & DecodeUnicode) == 0) {
1822 return qt_ACE_do(tld, ToAceOnly);
1828 Returns the result of the merge of this URL with \a relative. This
1829 URL is used as a base to convert \a relative to an absolute URL.
1831 If \a relative is not a relative URL, this function will return \a
1832 relative directly. Otherwise, the paths of the two URLs are
1833 merged, and the new URL returned has the scheme and authority of
1834 the base URL, but with the merged path, as in the following
1837 \snippet doc/src/snippets/code/src_corelib_io_qurl.cpp 5
1839 Calling resolved() with ".." returns a QUrl whose directory is
1840 one level higher than the original. Similarly, calling resolved()
1841 with "../.." removes two levels from the path. If \a relative is
1842 "/", the path becomes "/".
1846 QUrl QUrl::resolved(const QUrl &relative) const
1848 if (!d) return relative;
1849 if (!relative.d) return *this;
1852 // be non strict and allow scheme in relative url
1853 if (!relative.d->scheme.isEmpty() && relative.d->scheme != d->scheme) {
1856 if (relative.d->hasAuthority()) {
1859 t.d = new QUrlPrivate;
1861 // copy the authority
1862 t.d->userName = d->userName;
1863 t.d->password = d->password;
1864 t.d->host = d->host;
1865 t.d->port = d->port;
1866 t.d->sectionIsPresent = d->sectionIsPresent & QUrlPrivate::Authority;
1868 if (relative.d->path.isEmpty()) {
1869 t.d->path = d->path;
1870 if (relative.d->hasQuery()) {
1871 t.d->query = relative.d->query;
1872 t.d->sectionIsPresent |= QUrlPrivate::Query;
1873 } else if (d->hasQuery()) {
1874 t.d->query = d->query;
1875 t.d->sectionIsPresent |= QUrlPrivate::Query;
1878 t.d->path = relative.d->path.startsWith(QLatin1Char('/'))
1880 : d->mergePaths(relative.d->path);
1881 if (relative.d->hasQuery()) {
1882 t.d->query = relative.d->query;
1883 t.d->sectionIsPresent |= QUrlPrivate::Query;
1887 t.d->scheme = d->scheme;
1889 t.d->sectionIsPresent |= QUrlPrivate::Scheme;
1891 t.d->sectionIsPresent &= ~QUrlPrivate::Scheme;
1893 t.d->fragment = relative.d->fragment;
1894 if (relative.d->hasFragment())
1895 t.d->sectionIsPresent |= QUrlPrivate::Fragment;
1897 t.d->sectionIsPresent &= ~QUrlPrivate::Fragment;
1899 removeDotsFromPath(&t.d->path);
1901 #if defined(QURL_DEBUG)
1902 qDebug("QUrl(\"%s\").resolved(\"%s\") = \"%s\"",
1904 qPrintable(relative.url()),
1905 qPrintable(t.url()));
1911 Returns true if the URL is relative; otherwise returns false. A
1912 URL is relative if its scheme is undefined; this function is
1913 therefore equivalent to calling scheme().isEmpty().
1915 bool QUrl::isRelative() const
1917 if (!d) return true;
1918 return !d->hasScheme() && !d->path.startsWith(QLatin1Char('/'));
1922 Returns a string representation of the URL.
1923 The output can be customized by passing flags with \a options.
1925 The resulting QString can be passed back to a QUrl later on.
1927 Synonym for toString(options).
1929 \sa FormattingOptions, toEncoded(), toString()
1931 QString QUrl::url(FormattingOptions options) const
1933 return toString(options);
1937 Returns a string representation of the URL.
1938 The output can be customized by passing flags with \a options.
1940 \sa FormattingOptions, url(), setUrl()
1942 QString QUrl::toString(FormattingOptions options) const
1944 if (!d) return QString();
1946 // return just the path if:
1947 // - QUrl::PreferLocalFile is passed
1948 // - QUrl::RemovePath isn't passed (rather stupid if the user did...)
1949 // - there's no query or fragment to return
1950 // that is, either they aren't present, or we're removing them
1951 // - it's a local file
1952 // (test done last since it's the most expensive)
1953 if (options.testFlag(QUrl::PreferLocalFile) && !options.testFlag(QUrl::RemovePath)
1954 && (!d->hasQuery() || options.testFlag(QUrl::RemoveQuery))
1955 && (!d->hasFragment() || options.testFlag(QUrl::RemoveFragment))
1957 return path(options);
1962 if (!(options & QUrl::RemoveScheme) && d->hasScheme())
1963 url += d->scheme + QLatin1Char(':');
1965 bool pathIsAbsolute = d->path.startsWith(QLatin1Char('/'));
1966 if (!((options & QUrl::RemoveAuthority) == QUrl::RemoveAuthority) && d->hasAuthority()) {
1967 url += QLatin1String("//");
1968 d->appendAuthority(url, options);
1969 } else if (isLocalFile() && pathIsAbsolute) {
1970 url += QLatin1String("//");
1973 if (!(options & QUrl::RemovePath)) {
1974 // check if we need to insert a slash
1975 if (!pathIsAbsolute && !d->path.isEmpty() && !url.isEmpty() && !url.endsWith(QLatin1Char(':')))
1976 url += QLatin1Char('/');
1978 d->appendPath(url, options);
1979 // check if we need to remove trailing slashes
1980 while ((options & StripTrailingSlash) && url.endsWith(QLatin1Char('/')))
1984 if (!(options & QUrl::RemoveQuery) && d->hasQuery()) {
1985 url += QLatin1Char('?');
1986 d->appendQuery(url, options);
1988 if (!(options & QUrl::RemoveFragment) && d->hasFragment()) {
1989 url += QLatin1Char('#');
1990 d->appendFragment(url, options);
1999 Returns a human-displayable string representation of the URL.
2000 The output can be customized by passing flags with \a options.
2001 The option RemovePassword is always enabled, since passwords
2002 should never be shown back to users.
2004 With the default options, the resulting QString can be passed back
2005 to a QUrl later on, but any password that was present initially will
2008 \sa FormattingOptions, toEncoded(), toString()
2011 QString QUrl::toDisplayString(FormattingOptions options) const
2013 return toString(options | RemovePassword);
2017 Returns the encoded representation of the URL if it's valid;
2018 otherwise an empty QByteArray is returned. The output can be
2019 customized by passing flags with \a options.
2021 The user info, path and fragment are all converted to UTF-8, and
2022 all non-ASCII characters are then percent encoded. The host name
2023 is encoded using Punycode.
2025 QByteArray QUrl::toEncoded(FormattingOptions options) const
2027 QString stringForm = toString(options);
2028 if (options & DecodeUnicode)
2029 return stringForm.toUtf8();
2030 return stringForm.toLatin1();
2034 \fn QUrl QUrl::fromEncoded(const QByteArray &input, ParsingMode parsingMode)
2036 Parses \a input and returns the corresponding QUrl. \a input is
2037 assumed to be in encoded form, containing only ASCII characters.
2039 Parses the URL using \a parsingMode.
2041 \sa toEncoded(), setUrl()
2043 QUrl QUrl::fromEncoded(const QByteArray &input, ParsingMode mode)
2045 return QUrl(QString::fromUtf8(input.constData(), input.size()), mode);
2049 Returns a decoded copy of \a input. \a input is first decoded from
2050 percent encoding, then converted from UTF-8 to unicode.
2052 QString QUrl::fromPercentEncoding(const QByteArray &input)
2054 return QString::fromUtf8(QByteArray::fromPercentEncoding(input));
2058 Returns an encoded copy of \a input. \a input is first converted
2059 to UTF-8, and all ASCII-characters that are not in the unreserved group
2060 are percent encoded. To prevent characters from being percent encoded
2061 pass them to \a exclude. To force characters to be percent encoded pass
2064 Unreserved is defined as:
2065 ALPHA / DIGIT / "-" / "." / "_" / "~"
2067 \snippet doc/src/snippets/code/src_corelib_io_qurl.cpp 6
2069 QByteArray QUrl::toPercentEncoding(const QString &input, const QByteArray &exclude, const QByteArray &include)
2071 return input.toUtf8().toPercentEncoding(exclude, include);
2075 \fn QByteArray QUrl::toPunycode(const QString &uc)
2077 Returns a \a uc in Punycode encoding.
2079 Punycode is a Unicode encoding used for internationalized domain
2080 names, as defined in RFC3492. If you want to convert a domain name from
2081 Unicode to its ASCII-compatible representation, use toAce().
2085 \fn QString QUrl::fromPunycode(const QByteArray &pc)
2087 Returns the Punycode decoded representation of \a pc.
2089 Punycode is a Unicode encoding used for internationalized domain
2090 names, as defined in RFC3492. If you want to convert a domain from
2091 its ASCII-compatible encoding to the Unicode representation, use
2098 Returns the Unicode form of the given domain name
2099 \a domain, which is encoded in the ASCII Compatible Encoding (ACE).
2100 The result of this function is considered equivalent to \a domain.
2102 If the value in \a domain cannot be encoded, it will be converted
2103 to QString and returned.
2105 The ASCII Compatible Encoding (ACE) is defined by RFC 3490, RFC 3491
2106 and RFC 3492. It is part of the Internationalizing Domain Names in
2107 Applications (IDNA) specification, which allows for domain names
2108 (like \c "example.com") to be written using international
2111 QString QUrl::fromAce(const QByteArray &domain)
2113 return qt_ACE_do(QString::fromLatin1(domain), NormalizeAce);
2119 Returns the ASCII Compatible Encoding of the given domain name \a domain.
2120 The result of this function is considered equivalent to \a domain.
2122 The ASCII-Compatible Encoding (ACE) is defined by RFC 3490, RFC 3491
2123 and RFC 3492. It is part of the Internationalizing Domain Names in
2124 Applications (IDNA) specification, which allows for domain names
2125 (like \c "example.com") to be written using international
2128 This function return an empty QByteArra if \a domain is not a valid
2129 hostname. Note, in particular, that IPv6 literals are not valid domain
2132 QByteArray QUrl::toAce(const QString &domain)
2134 QString result = qt_ACE_do(domain, ToAceOnly);
2135 return result.toLatin1();
2141 Returns true if this URL is "less than" the given \a url. This
2142 provides a means of ordering URLs.
2144 bool QUrl::operator <(const QUrl &url) const
2146 if (!d) return url.d;
2147 if (d->scheme < url.d->scheme)
2149 if (d->userName < url.d->userName)
2151 if (d->password < url.d->password)
2153 if (d->host < url.d->host)
2155 if (d->port < url.d->port)
2157 if (d->path < url.d->path)
2159 if (d->query < url.d->query)
2161 if (d->fragment < url.d->fragment)
2167 Returns true if this URL and the given \a url are equal;
2168 otherwise returns false.
2170 bool QUrl::operator ==(const QUrl &url) const
2174 return d->scheme == url.d->scheme &&
2175 d->userName == url.d->userName &&
2176 d->password == url.d->password &&
2177 d->host == url.d->host &&
2178 d->port == url.d->port &&
2179 d->path == url.d->path &&
2180 d->query == url.d->query &&
2181 d->fragment == url.d->fragment;
2185 Returns true if this URL and the given \a url are not equal;
2186 otherwise returns false.
2188 bool QUrl::operator !=(const QUrl &url) const
2190 return !(*this == url);
2194 Assigns the specified \a url to this object.
2196 QUrl &QUrl::operator =(const QUrl &url)
2205 qAtomicAssign(d, url.d);
2213 Assigns the specified \a url to this object.
2215 QUrl &QUrl::operator =(const QString &url)
2217 if (url.isEmpty()) {
2227 \fn void QUrl::swap(QUrl &other)
2230 Swaps URL \a other with this URL. This operation is very
2231 fast and never fails.
2241 d = new QUrlPrivate;
2249 bool QUrl::isDetached() const
2251 return !d || d->ref.load() == 1;
2256 Returns a QUrl representation of \a localFile, interpreted as a local
2257 file. This function accepts paths separated by slashes as well as the
2258 native separator for this platform.
2260 This function also accepts paths with a doubled leading slash (or
2261 backslash) to indicate a remote file, as in
2262 "//servername/path/to/file.txt". Note that only certain platforms can
2263 actually open this file using QFile::open().
2265 \sa toLocalFile(), isLocalFile(), QDir::toNativeSeparators()
2267 QUrl QUrl::fromLocalFile(const QString &localFile)
2270 url.setScheme(fileScheme());
2271 QString deslashified = QDir::fromNativeSeparators(localFile);
2273 // magic for drives on windows
2274 if (deslashified.length() > 1 && deslashified.at(1) == QLatin1Char(':') && deslashified.at(0) != QLatin1Char('/')) {
2275 deslashified.prepend(QLatin1Char('/'));
2276 } else if (deslashified.startsWith(QLatin1String("//"))) {
2277 // magic for shared drive on windows
2278 int indexOfPath = deslashified.indexOf(QLatin1Char('/'), 2);
2279 url.setHost(deslashified.mid(2, indexOfPath - 2));
2280 if (indexOfPath > 2)
2281 deslashified = deslashified.right(deslashified.length() - indexOfPath);
2283 deslashified.clear();
2286 url.setPath(deslashified.replace(QLatin1Char('%'), QStringLiteral("%25")));
2291 Returns the path of this URL formatted as a local file path. The path
2292 returned will use forward slashes, even if it was originally created
2293 from one with backslashes.
2295 If this URL contains a non-empty hostname, it will be encoded in the
2296 returned value in the form found on SMB networks (for example,
2297 "//servername/path/to/file.txt").
2299 \sa fromLocalFile(), isLocalFile()
2301 QString QUrl::toLocalFile() const
2303 // the call to isLocalFile() also ensures that we're parsed
2308 QString ourPath = path();
2310 // magic for shared drive on windows
2311 if (!d->host.isEmpty()) {
2312 tmp = QStringLiteral("//") + d->host + (ourPath.length() > 0 && ourPath.at(0) != QLatin1Char('/')
2313 ? QLatin1Char('/') + ourPath : ourPath);
2316 // magic for drives on windows
2317 if (ourPath.length() > 2 && ourPath.at(0) == QLatin1Char('/') && ourPath.at(2) == QLatin1Char(':'))
2326 Returns true if this URL is pointing to a local file path. A URL is a
2327 local file path if the scheme is "file".
2329 Note that this function considers URLs with hostnames to be local file
2330 paths, even if the eventual file path cannot be opened with
2333 \sa fromLocalFile(), toLocalFile()
2335 bool QUrl::isLocalFile() const
2337 if (!d) return false;
2339 if (d->scheme != fileScheme())
2340 return false; // not file
2345 Returns true if this URL is a parent of \a childUrl. \a childUrl is a child
2346 of this URL if the two URLs share the same scheme and authority,
2347 and this URL's path is a parent of the path of \a childUrl.
2349 bool QUrl::isParentOf(const QUrl &childUrl) const
2351 QString childPath = childUrl.path();
2354 return ((childUrl.scheme().isEmpty())
2355 && (childUrl.authority().isEmpty())
2356 && childPath.length() > 0 && childPath.at(0) == QLatin1Char('/'));
2358 QString ourPath = path();
2360 return ((childUrl.scheme().isEmpty() || d->scheme == childUrl.scheme())
2361 && (childUrl.authority().isEmpty() || authority() == childUrl.authority())
2362 && childPath.startsWith(ourPath)
2363 && ((ourPath.endsWith(QLatin1Char('/')) && childPath.length() > ourPath.length())
2364 || (!ourPath.endsWith(QLatin1Char('/'))
2365 && childPath.length() > ourPath.length() && childPath.at(ourPath.length()) == QLatin1Char('/'))));
2369 #ifndef QT_NO_DATASTREAM
2372 Writes url \a url to the stream \a out and returns a reference
2375 \sa \link datastreamformat.html Format of the QDataStream operators \endlink
2377 QDataStream &operator<<(QDataStream &out, const QUrl &url)
2379 QByteArray u = url.toString(QUrl::FullyEncoded).toLatin1();
2386 Reads a url into \a url from the stream \a in and returns a
2387 reference to the stream.
2389 \sa \link datastreamformat.html Format of the QDataStream operators \endlink
2391 QDataStream &operator>>(QDataStream &in, QUrl &url)
2395 url.setUrl(QString::fromLatin1(u));
2398 #endif // QT_NO_DATASTREAM
2400 #ifndef QT_NO_DEBUG_STREAM
2401 QDebug operator<<(QDebug d, const QUrl &url)
2403 d.maybeSpace() << "QUrl(" << url.toDisplayString() << ')';
2411 Returns a text string that explains why an URL is invalid in the case being;
2412 otherwise returns an empty string.
2414 QString QUrl::errorString() const
2420 \typedef QUrl::DataPtr
2425 \fn DataPtr &QUrl::data_ptr()
2429 // The following code has the following copyright:
2431 Copyright (C) Research In Motion Limited 2009. All rights reserved.
2433 Redistribution and use in source and binary forms, with or without
2434 modification, are permitted provided that the following conditions are met:
2435 * Redistributions of source code must retain the above copyright
2436 notice, this list of conditions and the following disclaimer.
2437 * Redistributions in binary form must reproduce the above copyright
2438 notice, this list of conditions and the following disclaimer in the
2439 documentation and/or other materials provided with the distribution.
2440 * Neither the name of Research In Motion Limited nor the
2441 names of its contributors may be used to endorse or promote products
2442 derived from this software without specific prior written permission.
2444 THIS SOFTWARE IS PROVIDED BY Research In Motion Limited ''AS IS'' AND ANY
2445 EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
2446 WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
2447 DISCLAIMED. IN NO EVENT SHALL Research In Motion Limited BE LIABLE FOR ANY
2448 DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
2449 (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
2450 LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
2451 ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
2452 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
2453 SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
2459 Returns a valid URL from a user supplied \a userInput string if one can be
2460 deducted. In the case that is not possible, an invalid QUrl() is returned.
2464 Most applications that can browse the web, allow the user to input a URL
2465 in the form of a plain string. This string can be manually typed into
2466 a location bar, obtained from the clipboard, or passed in via command
2469 When the string is not already a valid URL, a best guess is performed,
2470 making various web related assumptions.
2472 In the case the string corresponds to a valid file path on the system,
2473 a file:// URL is constructed, using QUrl::fromLocalFile().
2475 If that is not the case, an attempt is made to turn the string into a
2476 http:// or ftp:// URL. The latter in the case the string starts with
2477 'ftp'. The result is then passed through QUrl's tolerant parser, and
2478 in the case or success, a valid QUrl is returned, or else a QUrl().
2483 \li qt.nokia.com becomes http://qt.nokia.com
2484 \li ftp.qt.nokia.com becomes ftp://ftp.qt.nokia.com
2485 \li hostname becomes http://hostname
2486 \li /home/user/test.html becomes file:///home/user/test.html
2489 QUrl QUrl::fromUserInput(const QString &userInput)
2491 QString trimmedString = userInput.trimmed();
2493 // Check first for files, since on Windows drive letters can be interpretted as schemes
2494 if (QDir::isAbsolutePath(trimmedString))
2495 return QUrl::fromLocalFile(trimmedString);
2497 QUrl url = QUrl(trimmedString, QUrl::TolerantMode);
2498 QUrl urlPrepended = QUrl(QStringLiteral("http://") + trimmedString, QUrl::TolerantMode);
2500 // Check the most common case of a valid url with scheme and host
2501 // We check if the port would be valid by adding the scheme to handle the case host:port
2502 // where the host would be interpretted as the scheme
2504 && !url.scheme().isEmpty()
2505 && (!url.host().isEmpty() || !url.path().isEmpty())
2506 && urlPrepended.port() == -1)
2509 // Else, try the prepended one and adjust the scheme from the host name
2510 if (urlPrepended.isValid() && (!urlPrepended.host().isEmpty() || !urlPrepended.path().isEmpty()))
2512 int dotIndex = trimmedString.indexOf(QLatin1Char('.'));
2513 const QString hostscheme = trimmedString.left(dotIndex).toLower();
2514 if (hostscheme == ftpScheme())
2515 urlPrepended.setScheme(ftpScheme());
2516 return urlPrepended;