1 /****************************************************************************
3 ** Copyright (C) 2012 Nokia Corporation and/or its subsidiary(-ies).
4 ** Copyright (C) 2012 Intel Corporation.
5 ** Contact: http://www.qt-project.org/
7 ** This file is part of the QtCore module of the Qt Toolkit.
9 ** $QT_BEGIN_LICENSE:LGPL$
10 ** GNU Lesser General Public License Usage
11 ** This file may be used under the terms of the GNU Lesser General Public
12 ** License version 2.1 as published by the Free Software Foundation and
13 ** appearing in the file LICENSE.LGPL included in the packaging of this
14 ** file. Please review the following information to ensure the GNU Lesser
15 ** General Public License version 2.1 requirements will be met:
16 ** http://www.gnu.org/licenses/old-licenses/lgpl-2.1.html.
18 ** In addition, as a special exception, Nokia gives you certain additional
19 ** rights. These rights are described in the Nokia Qt LGPL Exception
20 ** version 1.1, included in the file LGPL_EXCEPTION.txt in this package.
22 ** GNU General Public License Usage
23 ** Alternatively, this file may be used under the terms of the GNU General
24 ** Public License version 3.0 as published by the Free Software Foundation
25 ** and appearing in the file LICENSE.GPL included in the packaging of this
26 ** file. Please review the following information to ensure the GNU General
27 ** Public License version 3.0 requirements will be met:
28 ** http://www.gnu.org/copyleft/gpl.html.
31 ** Alternatively, this file may be used in accordance with the terms and
32 ** conditions contained in a signed written agreement between you and Nokia.
41 ****************************************************************************/
46 \brief The QUrl class provides a convenient interface for working
55 It can parse and construct URLs in both encoded and unencoded
56 form. QUrl also has support for internationalized domain names
59 The most common way to use QUrl is to initialize it via the
60 constructor by passing a QString. Otherwise, setUrl() and
61 setEncodedUrl() can also be used.
63 URLs can be represented in two forms: encoded or unencoded. The
64 unencoded representation is suitable for showing to users, but
65 the encoded representation is typically what you would send to
66 a web server. For example, the unencoded URL
67 "http://b\\uuml\c{}hler.example.com" would be sent to the server as
68 "http://xn--bhler-kva.example.com/List%20of%20applicants.xml".
70 A URL can also be constructed piece by piece by calling
71 setScheme(), setUserName(), setPassword(), setHost(), setPort(),
72 setPath(), setEncodedQuery() and setFragment(). Some convenience
73 functions are also available: setAuthority() sets the user name,
74 password, host and port. setUserInfo() sets the user name and
77 Call isValid() to check if the URL is valid. This can be done at
78 any point during the constructing of a URL.
80 Constructing a query is particularly convenient through the use
81 of setQueryItems(), addQueryItem() and removeQueryItem(). Use
82 setQueryDelimiters() to customize the delimiters used for
83 generating the query string.
85 For the convenience of generating encoded URL strings or query
86 strings, there are two static functions called
87 fromPercentEncoding() and toPercentEncoding() which deal with
88 percent encoding and decoding of QStrings.
90 Calling isRelative() will tell whether or not the URL is
91 relative. A relative URL can be resolved by passing it as argument
92 to resolved(), which returns an absolute URL. isParentOf() is used
93 for determining whether one URL is a parent of another.
95 fromLocalFile() constructs a QUrl by parsing a local
96 file path. toLocalFile() converts a URL to a local file path.
98 The human readable representation of the URL is fetched with
99 toString(). This representation is appropriate for displaying a
100 URL to a user in unencoded form. The encoded form however, as
101 returned by toEncoded(), is for internal use, passing to web
102 servers, mail clients and so on.
104 QUrl conforms to the URI specification from
105 \l{RFC 3986} (Uniform Resource Identifier: Generic Syntax), and includes
106 scheme extensions from \l{RFC 1738} (Uniform Resource Locators). Case
107 folding rules in QUrl conform to \l{RFC 3491} (Nameprep: A Stringprep
108 Profile for Internationalized Domain Names (IDN)).
110 \section2 Character Conversions
112 Follow these rules to avoid erroneous character conversion when
113 dealing with URLs and strings:
116 \li When creating an QString to contain a URL from a QByteArray or a
117 char*, always use QString::fromUtf8().
122 \enum QUrl::ParsingMode
124 The parsing mode controls the way QUrl parses strings.
126 \value TolerantMode QUrl will try to correct some common errors in URLs.
127 This mode is useful for parsing URLs coming from sources
128 not known to be strictly standards-conforming.
130 \value StrictMode Only valid URLs are accepted. This mode is useful for
131 general URL validation.
133 \value DecodedMode QUrl will interpret the URL component in the fully-decoded form,
134 where percent characters stand for themselves, not as the beginning
135 of a percent-encoded sequence. This mode is only valid for the
136 setters setting components of a URL; it is not permitted in
137 the QUrl constructor, in fromEncoded() or in setUrl().
139 In TolerantMode, the parser has the following behaviour:
143 \li Spaces and "%20": unencoded space characters will be accepted and will
144 be treated as equivalent to "%20".
146 \li Single "%" characters: Any occurrences of a percent character "%" not
147 followed by exactly two hexadecimal characters (e.g., "13% coverage.html")
148 will be replaced by "%25". Note that one lone "%" character will trigger
149 the correction mode for all percent characters.
151 \li Reserved and unreserved characters: An encoded URL should only
152 contain a few characters as literals; all other characters should
153 be percent-encoded. In TolerantMode, these characters will be
154 automatically percent-encoded where they are not allowed:
155 space / double-quote / "<" / ">" / "\" /
156 "^" / "`" / "{" / "|" / "}"
157 Those same characters can be decoded again by passing QUrl::DecodeReserved
158 to toString() or toEncoded().
162 When in StrictMode, if a parsing error is found, isValid() will return \c
163 false and errorString() will return a simple message describing the error.
164 If more than one error is detected, it is undefined which error gets
167 Note that TolerantMode is not usually enough for parsing user input, which
168 often contains more errors and expectations than the parser can deal with.
169 When dealing with data coming directly from the user -- as opposed to data
170 coming from data-transfer sources, such as other programs -- it is
171 recommended to use fromUserInput().
173 \sa fromUserInput(), setUrl(), toString(), toEncoded(), QUrl::FormattingOptions
177 \enum QUrl::FormattingOptions
179 The formatting options define how the URL is formatted when written out
182 \value None The format of the URL is unchanged.
183 \value RemoveScheme The scheme is removed from the URL.
184 \value RemovePassword Any password in the URL is removed.
185 \value RemoveUserInfo Any user information in the URL is removed.
186 \value RemovePort Any specified port is removed from the URL.
187 \value RemoveAuthority
188 \value RemovePath The URL's path is removed, leaving only the scheme,
189 host address, and port (if present).
190 \value RemoveQuery The query part of the URL (following a '?' character)
192 \value RemoveFragment
193 \value PreferLocalFile If the URL is a local file according to isLocalFile()
194 and contains no query or fragment, a local file path is returned.
195 \value StripTrailingSlash The trailing slash is removed if one is present.
197 Note that the case folding rules in \l{RFC 3491}{Nameprep}, which QUrl
198 conforms to, require host names to always be converted to lower case,
199 regardless of the Qt::FormattingOptions used.
201 The options from QUrl::ComponentFormattingOptions are also possible.
203 \sa QUrl::ComponentFormattingOptions
207 \enum QUrl::ComponentFormattingOptions
210 The component formatting options define how the components of an URL will
211 be formatted when written out as text. They can be combined with the
212 options from QUrl::FormattingOptions when used in toString() and
215 \value PrettyDecoded The component is returned in a "pretty form", with
216 most percent-encoded characters decoded. The exact
217 behavior of PrettyDecoded varies from component to
218 component and may also change from Qt release to Qt
219 release. This is the default.
221 \value EncodeSpaces Leave space characters in their encoded form ("%20").
223 \value EncodeUnicode Leave non-US-ASCII characters encoded in their UTF-8
224 percent-encoded form (e.g., "%C3%A9" for the U+00E9
225 codepoint, LATIN SMALL LETTER E WITH ACUTE).
227 \value EncodeDelimiters Leave certain delimiters in their encoded form, as
228 would appear in the URL when the full URL is
229 represented as text. The delimiters are affected
230 by this option change from component to component.
232 \value EncodeReserved Leave the US-ASCII reserved characters in their encoded
235 \value DecodeReseved Decode the US-ASCII reserved characters.
237 \value FullyEncoded Leave all characters in their properly-encoded form,
238 as this component would appear as part of a URL. When
239 used with toString(), this produces a fully-compliant
240 URL in QString form, exactly equal to the result of
243 \value FullyDecoded Attempt to decode as much as possible. For individual
244 components of the URL, this decodes every percent
245 encoding sequence, including control characters (U+0000
246 to U+001F) and UTF-8 sequences found in percent-encoded form.
247 Note: if the component contains non-US-ASCII sequences
248 that aren't valid UTF-8 sequences, the behaviour is
249 undefined since QString cannot represent those values
251 This mode is should not be used in functions where more
252 than one URL component is returned (userInfo() and authority())
253 and it is not allowed in url() and toString().
255 The values of EncodeReserved and DecodeReserved should not be used together
256 in one call. The behaviour is undefined if that happens. They are provided
257 as separate values because the behaviour of the "pretty mode" with regards
258 to reserved characters is different on certain components and specially on
261 The FullyDecoded mode is similar to the behaviour of the functions
262 returning QString in Qt 4.x, including the fact that they will most likely
263 cause data loss if the component in question contains a non-UTF-8
264 percent-encoded sequence. Fortunately, those cases aren't common, so this
265 mode should be used when the component in question is used in a non-URL
266 context. For example, in an FTP client application, the path to the remote
267 file could be stored in a QUrl object, and the string to be transmitted to
268 the FTP server should be obtained using this flag.
270 \sa QUrl::FormattingOptions
275 #include "qplatformdefs.h"
277 #include "qstringlist.h"
280 #include "qdir.h" // for QDir::fromNativeSeparators
281 #include "qtldurl_p.h"
282 #include "private/qipaddress_p.h"
283 #include "qurlquery.h"
284 #if defined(Q_OS_WINCE_WM)
285 #pragma optimize("g", off)
290 inline static bool isHex(char c)
293 return (c >= '0' && c <= '9') || (c >= 'a' && c <= 'f');
296 static inline QString ftpScheme()
298 return QStringLiteral("ftp");
301 static inline QString httpScheme()
303 return QStringLiteral("http");
306 static inline QString fileScheme()
308 return QStringLiteral("file");
311 QUrlPrivate::QUrlPrivate()
313 errorCode(NoError), errorSupplement(0),
314 sectionIsPresent(0), sectionHasError(0)
318 QUrlPrivate::QUrlPrivate(const QUrlPrivate ©)
319 : ref(1), port(copy.port),
321 userName(copy.userName),
322 password(copy.password),
326 fragment(copy.fragment),
327 errorCode(copy.errorCode),
328 errorSupplement(copy.errorSupplement),
329 sectionIsPresent(copy.sectionIsPresent),
330 sectionHasError(copy.sectionHasError)
334 // From RFC 3896, Appendix A Collected ABNF for URI
335 // URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
337 // scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
339 // authority = [ userinfo "@" ] host [ ":" port ]
340 // userinfo = *( unreserved / pct-encoded / sub-delims / ":" )
341 // host = IP-literal / IPv4address / reg-name
344 // reg-name = *( unreserved / pct-encoded / sub-delims )
346 // pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
348 // query = *( pchar / "/" / "?" )
350 // fragment = *( pchar / "/" / "?" )
352 // pct-encoded = "%" HEXDIG HEXDIG
354 // unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
355 // reserved = gen-delims / sub-delims
356 // gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
357 // sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
358 // / "*" / "+" / "," / ";" / "="
359 // the path component has a complex ABNF that basically boils down to
360 // slash-separated segments of "pchar"
362 // The above is the strict definition of the URL components and it is what we
363 // return encoded as FullyEncoded. However, we store the equivalent to
364 // PrettyDecoded internally, as that is the default formatting mode and most
365 // likely to be used. PrettyDecoded decodes spaces, unicode sequences and
366 // unambiguous delimiters.
368 // An ambiguous delimiter is a delimiter that, if appeared decoded, would be
369 // interpreted as the beginning of a new component. The exact delimiters that
370 // match that definition change according to the use. When each field is
371 // considered in isolation from the rest, there are no ambiguities. In other
372 // words, we always store the most decoded form (except for the query, see
375 // The ambiguities arise when components are put together. From last to first
376 // component of a full URL, the ambiguities are:
377 // - fragment: none, since it's the last.
378 // - query: the "#" character is ambiguous, as it starts the fragment. In
379 // addition, the "+" character is treated specially, as should be both
380 // intra-query delimiters. Since we don't know which ones they are, we
381 // keep all reserved characters untouched.
382 // - path: the "#" and "?" characters are ambigous. In addition to them,
383 // the slash itself is considered special.
384 // - host: completely special but never ambiguous, see setHost() below.
385 // - password: the "#", "?", "/", "[", "]" and "@" characters are ambiguous
386 // - username: the "#", "?", "/", "[", "]", "@", and ":" characters are ambiguous
387 // - scheme: doesn't accept any delimiter, see setScheme() below.
389 // When the authority component is considered in isolation, the ambiguities of
390 // its components are:
391 // - host: special, never ambiguous
392 // - password: "[", "]", "@" are ambiguous
393 // - username: "[", "]", "@", ":" are ambiguous
395 // Finally, when the userinfo is considered in isolation, the ambiguities of its
397 // - password: none, since it's the last
398 // - username: ":" is ambiguous
400 // list the recoding table modifications to be used with the recodeFromUser and
401 // appendToUser functions, according to the rules above.
402 // the encodedXXX tables are run with the delimiters set to "leave" by default;
403 // the decodedXXX tables are run with the delimiters set to "decode" by default
404 // (except for the query, which doesn't use these functions)
406 #define decode(x) ushort(x)
407 #define leave(x) ushort(0x100 | (x))
408 #define encode(x) ushort(0x200 | (x))
410 static const ushort encodedUserNameActions[] = {
411 // first field, everything must be encoded, including the ":"
412 // userinfo = *( unreserved / pct-encoded / sub-delims / ":" )
422 static const ushort * const decodedUserNameInAuthorityActions = encodedUserNameActions + 3;
423 static const ushort * const decodedUserNameInUserInfoActions = encodedUserNameActions + 6;
424 static const ushort * const decodedUserNameInUrlActions = encodedUserNameActions;
425 static const ushort * const decodedUserNameInIsolationActions = 0;
427 static const ushort encodedPasswordActions[] = {
428 // same as encodedUserNameActions, but decode ":"
429 // userinfo = *( unreserved / pct-encoded / sub-delims / ":" )
438 static const ushort * const decodedPasswordInAuthorityActions = encodedPasswordActions + 3;
439 static const ushort * const decodedPasswordInUserInfoActions = 0;
440 static const ushort * const decodedPasswordInUrlActions = encodedPasswordActions;
441 static const ushort * const decodedPasswordInIsolationActions = 0;
443 static const ushort encodedPathActions[] = {
444 // pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
452 static const ushort decodedPathInUrlActions[] = {
460 static const ushort * const decodedPathInIsolationActions = encodedPathActions + 4; // leave('/')
462 static const ushort encodedFragmentActions[] = {
463 // fragment = *( pchar / "/" / "?" )
464 // gen-delims permitted: ":" / "@" / "/" / "?"
465 // -> must encode: "[" / "]" / "#"
466 // HOWEVER: we allow "#" to remain decoded
476 static const ushort * const decodedFragmentInUrlActions = 0;
477 static const ushort * const decodedFragmentInIsolationActions = 0;
479 // the query is handled specially: the decodedQueryXXX tables are run with
480 // the delimiters set to "leave" by default and the others set to "encode"
481 static const ushort encodedQueryActions[] = {
482 // query = *( pchar / "/" / "?" )
483 // gen-delims permitted: ":" / "@" / "/" / "?"
484 // HOWEVER: we leave alone them alone, plus "[" and "]"
485 // -> must encode: "#"
489 static const ushort decodedQueryInIsolationActions[] = {
501 static const ushort decodedQueryInUrlActions[] = {
508 static inline void parseDecodedComponent(QString &data)
510 data.replace(QLatin1Char('%'), QStringLiteral("%25"));
513 static inline QString
514 recodeFromUser(const QString &input, const ushort *actions, int from, int to)
517 const QChar *begin = input.constData() + from;
518 const QChar *end = input.constData() + to;
519 if (qt_urlRecode(output, begin, end,
520 QUrl::DecodeReserved, actions))
523 return input.mid(from, to - from);
526 // appendXXXX functions:
527 // the internal value is stored in its most decoded form, so that case is easy.
528 // DecodeUnicode and DecodeSpaces are handled by qt_urlRecode.
529 // That leaves these functions to handle two cases related to delimiters:
530 // 1) encoded encodedXXXX tables
531 // 2) decoded decodedXXXX tables
532 static inline void appendToUser(QString &appendTo, const QString &value, QUrl::FormattingOptions options,
533 const ushort *encodedActions, const ushort *decodedActions)
535 if (options == QUrl::PrettyDecoded) {
540 const ushort *actions = 0;
541 if (options & QUrl::EncodeDelimiters)
542 actions = encodedActions;
544 actions = decodedActions;
546 if (!qt_urlRecode(appendTo, value.constData(), value.constEnd(), options, actions))
550 inline void QUrlPrivate::appendAuthority(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const
552 if ((options & QUrl::RemoveUserInfo) != QUrl::RemoveUserInfo) {
553 appendUserInfo(appendTo, options, appendingTo);
555 // add '@' only if we added anything
556 if (hasUserName() || (hasPassword() && (options & QUrl::RemovePassword) == 0))
557 appendTo += QLatin1Char('@');
559 appendHost(appendTo, options);
560 if (!(options & QUrl::RemovePort) && port != -1)
561 appendTo += QLatin1Char(':') + QString::number(port);
564 inline void QUrlPrivate::appendUserInfo(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const
566 if (Q_LIKELY(userName.isEmpty() && password.isEmpty()))
569 const ushort *userNameActions;
570 const ushort *passwordActions;
571 if (options & QUrl::EncodeDelimiters) {
572 userNameActions = encodedUserNameActions;
573 passwordActions = encodedPasswordActions;
575 switch (appendingTo) {
577 userNameActions = decodedUserNameInUserInfoActions;
578 passwordActions = decodedPasswordInUserInfoActions;
582 userNameActions = decodedUserNameInAuthorityActions;
583 passwordActions = decodedPasswordInAuthorityActions;
588 userNameActions = decodedUserNameInUrlActions;
589 passwordActions = decodedPasswordInUrlActions;
594 if ((options & QUrl::EncodeReserved) == 0)
595 options |= QUrl::DecodeReserved;
597 if (!qt_urlRecode(appendTo, userName.constData(), userName.constEnd(), options, userNameActions))
598 appendTo += userName;
599 if (options & QUrl::RemovePassword || !hasPassword()) {
602 appendTo += QLatin1Char(':');
603 if (!qt_urlRecode(appendTo, password.constData(), password.constEnd(), options, passwordActions))
604 appendTo += password;
608 inline void QUrlPrivate::appendUserName(QString &appendTo, QUrl::FormattingOptions options) const
610 appendToUser(appendTo, userName, options, encodedUserNameActions, decodedUserNameInIsolationActions);
613 inline void QUrlPrivate::appendPassword(QString &appendTo, QUrl::FormattingOptions options) const
615 appendToUser(appendTo, password, options, encodedPasswordActions, decodedPasswordInIsolationActions);
618 inline void QUrlPrivate::appendPath(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const
620 if (appendingTo != Path && !(options & QUrl::EncodeDelimiters)) {
621 if (!qt_urlRecode(appendTo, path.constData(), path.constEnd(), options, decodedPathInUrlActions))
625 appendToUser(appendTo, path, options, encodedPathActions, decodedPathInIsolationActions);
629 inline void QUrlPrivate::appendFragment(QString &appendTo, QUrl::FormattingOptions options) const
631 appendToUser(appendTo, fragment, options, encodedFragmentActions, decodedFragmentInIsolationActions);
634 inline void QUrlPrivate::appendQuery(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const
636 // almost the same code as the previous functions
637 // except we prefer not to touch the delimiters
638 if (options == QUrl::PrettyDecoded && appendingTo == Query) {
643 const ushort *actions = 0;
644 if (options & QUrl::EncodeDelimiters) {
645 actions = encodedQueryActions;
647 // reset to default qt_urlRecode behaviour (leave delimiters alone)
648 options |= QUrl::EncodeDelimiters;
649 actions = appendingTo == Query ? decodedQueryInIsolationActions : decodedQueryInUrlActions;
652 if (!qt_urlRecode(appendTo, query.constData(), query.constData() + query.length(),
659 bool QUrlPrivate::setScheme(const QString &value, int len, bool decoded)
661 // schemes are strictly RFC-compliant:
662 // scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
663 // but we need to decode any percent-encoding sequences that fall on
665 // we also lowercase the scheme
668 sectionIsPresent |= Scheme;
669 sectionHasError |= Scheme; // assume it has errors, we'll clear before returning true
670 errorCode = SchemeEmptyError;
675 errorCode = InvalidSchemeError;
676 int needsLowercasing = -1;
677 const ushort *p = reinterpret_cast<const ushort *>(value.constData());
678 for (int i = 0; i < len; ++i) {
679 if (p[i] >= 'a' && p[i] <= 'z')
681 if (p[i] >= 'A' && p[i] <= 'Z') {
682 needsLowercasing = i;
685 if (p[i] >= '0' && p[i] <= '9' && i > 0)
687 if (p[i] == '+' || p[i] == '-' || p[i] == '.')
691 // found a percent-encoded sign
692 // if we haven't decoded yet, decode and try again
693 errorSupplement = '%';
697 QString decodedScheme;
698 if (qt_urlRecode(decodedScheme, value.constData(), value.constData() + len, 0, 0) == 0)
700 return setScheme(decodedScheme, decodedScheme.length(), true);
703 // found something else
704 errorSupplement = p[i];
708 scheme = value.left(len);
709 sectionHasError &= ~Scheme;
712 if (needsLowercasing != -1) {
713 // schemes are ASCII only, so we don't need the full Unicode toLower
714 QChar *schemeData = scheme.data(); // force detaching here
715 for (int i = needsLowercasing; i >= 0; --i) {
716 register ushort c = schemeData[i].unicode();
717 if (c >= 'A' && c <= 'Z')
718 schemeData[i] = c + 0x20;
724 bool QUrlPrivate::setAuthority(const QString &auth, int from, int end)
726 sectionHasError &= ~Authority;
727 sectionIsPresent &= ~Authority;
728 sectionIsPresent |= Host;
737 int userInfoIndex = auth.indexOf(QLatin1Char('@'), from);
738 if (uint(userInfoIndex) < uint(end)) {
739 setUserInfo(auth, from, userInfoIndex);
740 from = userInfoIndex + 1;
743 int colonIndex = auth.lastIndexOf(QLatin1Char(':'), end - 1);
744 if (colonIndex < from)
747 if (uint(colonIndex) < uint(end)) {
748 if (auth.at(from).unicode() == '[') {
749 // check if colonIndex isn't inside the "[...]" part
750 int closingBracket = auth.indexOf(QLatin1Char(']'), from);
751 if (uint(closingBracket) > uint(colonIndex))
756 if (colonIndex == end - 1) {
757 // found a colon but no digits after it
758 sectionHasError |= Port;
759 errorCode = PortEmptyError;
760 } else if (uint(colonIndex) < uint(end)) {
762 for (int i = colonIndex + 1; i < end; ++i) {
763 ushort c = auth.at(i).unicode();
764 if (c >= '0' && c <= '9') {
768 sectionHasError |= Port;
769 errorCode = InvalidPortError;
770 x = ulong(-1); // x != ushort(x)
774 if (x == ushort(x)) {
777 sectionHasError |= Port;
778 errorCode = InvalidPortError;
784 return setHost(auth, from, qMin<uint>(end, colonIndex)) && !(sectionHasError & Port);
787 void QUrlPrivate::setUserInfo(const QString &userInfo, int from, int end)
789 int delimIndex = userInfo.indexOf(QLatin1Char(':'), from);
790 setUserName(userInfo, from, qMin<uint>(delimIndex, end));
792 if (uint(delimIndex) >= uint(end)) {
794 sectionIsPresent &= ~Password;
795 sectionHasError &= ~Password;
797 setPassword(userInfo, delimIndex + 1, end);
801 inline void QUrlPrivate::setUserName(const QString &value, int from, int end)
803 sectionIsPresent |= UserName;
804 sectionHasError &= ~UserName;
805 userName = recodeFromUser(value, decodedUserNameInIsolationActions, from, end);
808 inline void QUrlPrivate::setPassword(const QString &value, int from, int end)
810 sectionIsPresent |= Password;
811 sectionHasError &= ~Password;
812 password = recodeFromUser(value, decodedPasswordInIsolationActions, from, end);
815 inline void QUrlPrivate::setPath(const QString &value, int from, int end)
817 // sectionIsPresent |= Path; // not used, save some cycles
818 sectionHasError &= ~Path;
819 path = recodeFromUser(value, decodedPathInIsolationActions, from, end);
822 // check for the "path-noscheme" case
823 // if the path contains a ":" before the first "/", it could be misinterpreted
827 inline void QUrlPrivate::setFragment(const QString &value, int from, int end)
829 sectionIsPresent |= Fragment;
830 sectionHasError &= ~Fragment;
831 fragment = recodeFromUser(value, decodedFragmentInIsolationActions, from, end);
834 inline void QUrlPrivate::setQuery(const QString &value, int from, int iend)
836 sectionIsPresent |= Query;
837 sectionHasError &= ~Query;
839 // use the default actions for the query (don't set QUrl::DecodeAllDelimiters)
841 const QChar *begin = value.constData() + from;
842 const QChar *end = value.constData() + iend;
844 // leave delimiters alone but decode the rest
845 if (qt_urlRecode(output, begin, end, QUrl::EncodeDelimiters,
846 decodedQueryInIsolationActions))
849 query = value.mid(from, iend - from);
853 // The RFC says the host is:
854 // host = IP-literal / IPv4address / reg-name
855 // IP-literal = "[" ( IPv6address / IPvFuture ) "]"
856 // IPvFuture = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )
857 // [a strict definition of IPv6Address and IPv4Address]
858 // reg-name = *( unreserved / pct-encoded / sub-delims )
860 // We deviate from the standard in all but IPvFuture. For IPvFuture we accept
861 // and store only exactly what the RFC says we should. No percent-encoding is
862 // permitted in this field, so Unicode characters and space aren't either.
864 // For IPv4 addresses, we accept broken addresses like inet_aton does (that is,
865 // less than three dots). However, we correct the address to the proper form
866 // and store the corrected address. After correction, we comply to the RFC and
867 // it's exclusively composed of unreserved characters.
869 // For IPv6 addresses, we accept addresses including trailing (embedded) IPv4
870 // addresses, the so-called v4-compat and v4-mapped addresses. We also store
871 // those addresses like that in the hostname field, which violates the spec.
872 // IPv6 hosts are stored with the square brackets in the QString. It also
873 // requires no transformation in any way.
875 // As for registered names, it's the other way around: we accept only valid
876 // hostnames as specified by STD 3 and IDNA. That means everything we accept is
877 // valid in the RFC definition above, but there are many valid reg-names
878 // according to the RFC that we do not accept in the name of security. Since we
879 // do accept IDNA, reg-names are subject to ACE encoding and decoding, which is
880 // specified by the DecodeUnicode flag. The hostname is stored in its Unicode form.
882 inline void QUrlPrivate::appendHost(QString &appendTo, QUrl::FormattingOptions options) const
884 // this is the only flag that matters
885 options &= QUrl::EncodeUnicode;
888 if (host.at(0).unicode() == '[') {
889 // IPv6Address and IPvFuture address never require any transformation
892 // this is either an IPv4Address or a reg-name
893 // if it is a reg-name, it is already stored in Unicode form
894 if (options == QUrl::EncodeUnicode)
895 appendTo += qt_ACE_do(host, ToAceOnly);
901 // the whole IPvFuture is passed and parsed here, including brackets
902 static int parseIpFuture(QString &host, const QChar *begin, const QChar *end)
904 // IPvFuture = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )
905 static const char acceptable[] =
906 "!$&'()*+,;=" // sub-delims
908 "-._~"; // unreserved
910 // the brackets and the "v" have been checked
911 if (begin[3].unicode() != '.')
912 return begin[3].unicode();
913 if ((begin[2].unicode() >= 'A' && begin[2].unicode() >= 'F') ||
914 (begin[2].unicode() >= 'a' && begin[2].unicode() <= 'f') ||
915 (begin[2].unicode() >= '0' && begin[2].unicode() <= '9')) {
916 // this is so unlikely that we'll just go down the slow path
917 // decode the whole string, skipping the "[vH." and "]" which we already know to be there
918 host += QString::fromRawData(begin, 4);
923 if (qt_urlRecode(decoded, begin, end, QUrl::FullyEncoded, 0)) {
924 begin = decoded.constBegin();
925 end = decoded.constEnd();
928 for ( ; begin != end; ++begin) {
929 if (begin->unicode() >= 'A' && begin->unicode() <= 'Z')
931 else if (begin->unicode() >= 'a' && begin->unicode() <= 'z')
933 else if (begin->unicode() >= '0' && begin->unicode() <= '9')
935 else if (begin->unicode() < 0x80 && strchr(acceptable, begin->unicode()) != 0)
938 return begin->unicode();
940 host += QLatin1Char(']');
943 return begin[2].unicode();
946 // ONLY the IPv6 address is parsed here, WITHOUT the brackets
947 static bool parseIp6(QString &host, const QChar *begin, const QChar *end)
949 QIPAddressUtils::IPv6Address address;
950 if (!QIPAddressUtils::parseIp6(address, begin, end)) {
951 // IPv6 failed parsing, check if it was a percent-encoded character in
952 // the middle and try again
954 if (!qt_urlRecode(decoded, begin, end, QUrl::FullyEncoded, 0)) {
955 // no transformation, nothing to re-parse
960 // if the parsing fails again, the qt_urlRecode above will return 0
961 return parseIp6(host, decoded.constBegin(), decoded.constEnd());
964 host.reserve(host.size() + (end - begin));
965 host += QLatin1Char('[');
966 QIPAddressUtils::toString(host, address);
967 host += QLatin1Char(']');
971 bool QUrlPrivate::setHost(const QString &value, int from, int iend, bool maybePercentEncoded)
973 const QChar *begin = value.constData() + from;
974 const QChar *end = value.constData() + iend;
976 const int len = end - begin;
978 sectionIsPresent |= Host;
979 sectionHasError &= ~Host;
983 if (begin[0].unicode() == '[') {
984 // IPv6Address or IPvFuture
985 // smallest IPv6 address is "[::]" (len = 4)
986 // smallest IPvFuture address is "[v7.X]" (len = 6)
987 if (end[-1].unicode() != ']') {
988 sectionHasError |= Host;
989 errorCode = HostMissingEndBracket;
993 if (len > 5 && begin[1].unicode() == 'v') {
994 int c = parseIpFuture(host, begin, end);
996 sectionHasError |= Host;
997 errorCode = InvalidIPvFutureError;
998 errorSupplement = short(c);
1003 if (parseIp6(host, begin + 1, end - 1))
1006 sectionHasError |= Host;
1007 errorCode = begin[1].unicode() == 'v' ?
1008 InvalidIPvFutureError : InvalidIPv6AddressError;
1012 // check if it's an IPv4 address
1013 QIPAddressUtils::IPv4Address ip4;
1014 if (QIPAddressUtils::parseIp4(ip4, begin, end)) {
1016 QIPAddressUtils::toString(host, ip4);
1017 sectionHasError &= ~Host;
1021 // This is probably a reg-name.
1022 // But it can also be an encoded string that, when decoded becomes one
1023 // of the types above.
1025 // Two types of encoding are possible:
1026 // percent encoding (e.g., "%31%30%2E%30%2E%30%2E%31" -> "10.0.0.1")
1027 // Unicode encoding (some non-ASCII characters case-fold to digits
1028 // when nameprepping is done)
1030 // The qt_ACE_do function below applies nameprepping and the STD3 check.
1031 // That means a Unicode string may become an IPv4 address, but it cannot
1032 // produce a '[' or a '%'.
1034 // check for percent-encoding first
1036 if (maybePercentEncoded && qt_urlRecode(s, begin, end, QUrl::DecodeReserved, 0)) {
1037 // something was decoded
1038 // anything encoded left?
1039 if (s.contains(QChar(0x25))) { // '%'
1040 sectionHasError |= Host;
1041 errorCode = InvalidRegNameError;
1046 return setHost(s, 0, s.length(), false);
1049 s = qt_ACE_do(QString::fromRawData(begin, len), NormalizeAce);
1051 sectionHasError |= Host;
1052 errorCode = InvalidRegNameError;
1057 if (QIPAddressUtils::parseIp4(ip4, s.constBegin(), s.constEnd())) {
1058 QIPAddressUtils::toString(host, ip4);
1065 void QUrlPrivate::parse(const QString &url, QUrl::ParsingMode parsingMode)
1067 // URI-reference = URI / relative-ref
1068 // URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
1069 // relative-ref = relative-part [ "?" query ] [ "#" fragment ]
1070 // hier-part = "//" authority path-abempty
1071 // / other path types
1072 // relative-part = "//" authority path-abempty
1073 // / other path types here
1075 sectionIsPresent = 0;
1076 sectionHasError = 0;
1078 // find the important delimiters
1082 const int len = url.length();
1083 const QChar *const begin = url.constData();
1084 const ushort *const data = reinterpret_cast<const ushort *>(begin);
1086 for (int i = 0; i < len; ++i) {
1087 register uint uc = data[i];
1088 if (uc == '#' && hash == -1) {
1091 // nothing more to be found
1095 if (question == -1) {
1096 if (uc == ':' && colon == -1)
1103 // check if we have a scheme
1105 if (colon != -1 && setScheme(url, colon)) {
1106 hierStart = colon + 1;
1108 // recover from a failed scheme: it might not have been a scheme at all
1110 sectionHasError = 0;
1111 sectionIsPresent = 0;
1116 int hierEnd = qMin<uint>(qMin<uint>(question, hash), len);
1117 if (hierEnd - hierStart >= 2 && data[hierStart] == '/' && data[hierStart + 1] == '/') {
1118 // we have an authority, it ends at the first slash after these
1119 int authorityEnd = hierEnd;
1120 for (int i = hierStart + 2; i < authorityEnd ; ++i) {
1121 if (data[i] == '/') {
1127 setAuthority(url, hierStart + 2, authorityEnd);
1129 // even if we failed to set the authority properly, let's try to recover
1130 pathStart = authorityEnd;
1131 setPath(url, pathStart, hierEnd);
1137 pathStart = hierStart;
1139 if (hierStart < hierEnd)
1140 setPath(url, hierStart, hierEnd);
1145 if (uint(question) < uint(hash))
1146 setQuery(url, question + 1, qMin<uint>(hash, len));
1149 setFragment(url, hash + 1, len);
1151 if (sectionHasError || parsingMode == QUrl::TolerantMode)
1154 // The parsing so far was tolerant of errors, so the StrictMode
1155 // parsing is actually implemented here, as an extra post-check.
1156 // We only execute it if we haven't found any errors so far.
1158 // What we need to look out for, that the regular parser tolerates:
1159 // - percent signs not followed by two hex digits
1160 // - forbidden characters, which should always appear encoded
1161 // '"' / '<' / '>' / '\' / '^' / '`' / '{' / '|' / '}' / BKSP
1162 // control characters
1163 // - delimiters not allowed in certain positions
1164 // . scheme: parser is already strict
1165 // . user info: gen-delims (except for ':') disallowed
1166 // . host: parser is stricter than the standard
1167 // . port: parser is stricter than the standard
1168 // . path: all delimiters allowed
1169 // . fragment: all delimiters allowed
1170 // . query: all delimiters allowed
1171 // We would only need to check the user-info. However, the presence
1172 // of the disallowed gen-delims changes the parsing, so we don't
1173 // actually need to do anything
1174 static const char forbidden[] = "\"<>\\^`{|}\x7F";
1175 for (uint i = 0; i < uint(len); ++i) {
1176 register uint uc = data[i];
1180 if ((uc == '%' && (uint(len) < i + 2 || !isHex(data[i + 1]) || !isHex(data[i + 2])))
1181 || uc <= 0x20 || strchr(forbidden, uc)) {
1183 errorSupplement = uc;
1186 if (i > uint(hash)) {
1187 errorCode = InvalidFragmentError;
1188 sectionHasError |= Fragment;
1189 } else if (i > uint(question)) {
1190 errorCode = InvalidQueryError;
1191 sectionHasError |= Query;
1192 } else if (i > uint(pathStart)) {
1193 // pathStart is never -1
1194 errorCode = InvalidPathError;
1195 sectionHasError |= Path;
1197 // It must be in the authority, since the scheme is strict.
1198 // Since the port and hostname parsers are also strict,
1199 // the error can only have happened in the user info.
1200 int pos = url.indexOf(QLatin1Char(':'), hierStart);
1201 if (i > uint(pos)) {
1202 errorCode = InvalidPasswordError;
1203 sectionHasError |= Password;
1205 errorCode = InvalidUserNameError;
1206 sectionHasError |= UserName;
1214 From http://www.ietf.org/rfc/rfc3986.txt, 5.2.3: Merge paths
1216 Returns a merge of the current path with the relative path passed
1219 Note: \a relativePath is relative (does not start with '/').
1221 QString QUrlPrivate::mergePaths(const QString &relativePath) const
1223 // If the base URI has a defined authority component and an empty
1224 // path, then return a string consisting of "/" concatenated with
1225 // the reference's path; otherwise,
1226 if (!host.isEmpty() && path.isEmpty())
1227 return QLatin1Char('/') + relativePath;
1229 // Return a string consisting of the reference's path component
1230 // appended to all but the last segment of the base URI's path
1231 // (i.e., excluding any characters after the right-most "/" in the
1232 // base URI path, or excluding the entire base URI path if it does
1233 // not contain any "/" characters).
1235 if (!path.contains(QLatin1Char('/')))
1236 newPath = relativePath;
1238 newPath = path.leftRef(path.lastIndexOf(QLatin1Char('/')) + 1) + relativePath;
1244 From http://www.ietf.org/rfc/rfc3986.txt, 5.2.4: Remove dot segments
1246 Removes unnecessary ../ and ./ from the path. Used for normalizing
1249 static void removeDotsFromPath(QString *path)
1251 // The input buffer is initialized with the now-appended path
1252 // components and the output buffer is initialized to the empty
1254 QChar *out = path->data();
1255 const QChar *in = out;
1256 const QChar *end = out + path->size();
1258 // If the input buffer consists only of
1259 // "." or "..", then remove that from the input
1261 if (path->size() == 1 && in[0].unicode() == '.')
1263 else if (path->size() == 2 && in[0].unicode() == '.' && in[1].unicode() == '.')
1265 // While the input buffer is not empty, loop:
1268 // otherwise, if the input buffer begins with a prefix of "../" or "./",
1269 // then remove that prefix from the input buffer;
1270 if (path->size() >= 2 && in[0].unicode() == '.' && in[1].unicode() == '/')
1272 else if (path->size() >= 3 && in[0].unicode() == '.'
1273 && in[1].unicode() == '.' && in[2].unicode() == '/')
1276 // otherwise, if the input buffer begins with a prefix of
1277 // "/./" or "/.", where "." is a complete path segment,
1278 // then replace that prefix with "/" in the input buffer;
1279 if (in <= end - 3 && in[0].unicode() == '/' && in[1].unicode() == '.'
1280 && in[2].unicode() == '/') {
1283 } else if (in == end - 2 && in[0].unicode() == '/' && in[1].unicode() == '.') {
1284 *out++ = QLatin1Char('/');
1289 // otherwise, if the input buffer begins with a prefix
1290 // of "/../" or "/..", where ".." is a complete path
1291 // segment, then replace that prefix with "/" in the
1292 // input buffer and remove the last //segment and its
1293 // preceding "/" (if any) from the output buffer;
1294 if (in <= end - 4 && in[0].unicode() == '/' && in[1].unicode() == '.'
1295 && in[2].unicode() == '.' && in[3].unicode() == '/') {
1296 while (out > path->constData() && (--out)->unicode() != '/')
1298 if (out == path->constData() && out->unicode() != '/')
1302 } else if (in == end - 3 && in[0].unicode() == '/' && in[1].unicode() == '.'
1303 && in[2].unicode() == '.') {
1304 while (out > path->constData() && (--out)->unicode() != '/')
1306 if (out->unicode() == '/')
1312 // otherwise move the first path segment in
1313 // the input buffer to the end of the output
1314 // buffer, including the initial "/" character
1315 // (if any) and any subsequent characters up
1316 // to, but not including, the next "/"
1317 // character or the end of the input buffer.
1319 while (in < end && in->unicode() != '/')
1322 path->truncate(out - path->constData());
1326 void QUrlPrivate::validate() const
1328 QUrlPrivate *that = (QUrlPrivate *)this;
1329 that->encodedOriginal = that->toEncoded(); // may detach
1332 QURL_SETFLAG(that->stateFlags, Validated);
1337 QString auth = authority(); // causes the non-encoded forms to be valid
1339 // authority() calls canonicalHost() which sets this
1343 if (scheme == QLatin1String("mailto")) {
1344 if (!host.isEmpty() || port != -1 || !userName.isEmpty() || !password.isEmpty()) {
1345 that->isValid = false;
1346 that->errorInfo.setParams(0, QT_TRANSLATE_NOOP(QUrl, "expected empty host, username,"
1347 "port and password"),
1350 } else if (scheme == ftpScheme() || scheme == httpScheme()) {
1351 if (host.isEmpty() && !(path.isEmpty() && encodedPath.isEmpty())) {
1352 that->isValid = false;
1353 that->errorInfo.setParams(0, QT_TRANSLATE_NOOP(QUrl, "the host is empty, but not the path"),
1359 const QByteArray &QUrlPrivate::normalized() const
1361 if (QURL_HASFLAG(stateFlags, QUrlPrivate::Normalized))
1362 return encodedNormalized;
1364 QUrlPrivate *that = const_cast<QUrlPrivate *>(this);
1365 QURL_SETFLAG(that->stateFlags, QUrlPrivate::Normalized);
1367 QUrlPrivate tmp = *this;
1368 tmp.scheme = tmp.scheme.toLower();
1369 tmp.host = tmp.canonicalHost();
1371 // ensure the encoded and normalized parts of the URL
1372 tmp.ensureEncodedParts();
1373 if (tmp.encodedUserName.contains('%'))
1374 q_normalizePercentEncoding(&tmp.encodedUserName, userNameExcludeChars);
1375 if (tmp.encodedPassword.contains('%'))
1376 q_normalizePercentEncoding(&tmp.encodedPassword, passwordExcludeChars);
1377 if (tmp.encodedFragment.contains('%'))
1378 q_normalizePercentEncoding(&tmp.encodedFragment, fragmentExcludeChars);
1380 if (tmp.encodedPath.contains('%')) {
1381 // the path is a bit special:
1382 // the slashes shouldn't be encoded or decoded.
1383 // They should remain exactly like they are right now
1385 // treat the path as a slash-separated sequence of pchar
1387 result.reserve(tmp.encodedPath.length());
1388 if (tmp.encodedPath.startsWith('/'))
1391 const char *data = tmp.encodedPath.constData();
1396 nextSlash = tmp.encodedPath.indexOf('/', lastSlash);
1398 if (nextSlash == -1)
1399 len = tmp.encodedPath.length() - lastSlash;
1401 len = nextSlash - lastSlash;
1403 if (memchr(data + lastSlash, '%', len)) {
1404 // there's at least one percent before the next slash
1405 QByteArray block = QByteArray(data + lastSlash, len);
1406 q_normalizePercentEncoding(&block, pathExcludeChars);
1407 result.append(block);
1409 // no percents in this path segment, append wholesale
1410 result.append(data + lastSlash, len);
1413 // append the slash too, if it's there
1414 if (nextSlash != -1)
1417 lastSlash = nextSlash;
1418 } while (lastSlash != -1);
1420 tmp.encodedPath = result;
1423 if (!tmp.scheme.isEmpty()) // relative test
1424 removeDotsFromPath(&tmp.encodedPath);
1426 int qLen = tmp.query.length();
1427 for (int i = 0; i < qLen; i++) {
1428 if (qLen - i > 2 && tmp.query.at(i) == '%') {
1430 tmp.query[i] = qToLower(tmp.query.at(i));
1432 tmp.query[i] = qToLower(tmp.query.at(i));
1435 encodedNormalized = tmp.toEncoded();
1437 return encodedNormalized;
1442 \macro QT_NO_URL_CAST_FROM_STRING
1445 Disables automatic conversions from QString (or char *) to QUrl.
1447 Compiling your code with this define is useful when you have a lot of
1448 code that uses QString for file names and you wish to convert it to
1449 use QUrl for network transparency. In any code that uses QUrl, it can
1450 help avoid missing QUrl::resolved() calls, and other misuses of
1451 QString to QUrl conversions.
1454 url = filename; // probably not what you want
1456 url = QUrl::fromLocalFile(filename);
1457 url = baseurl.resolved(QUrl(filename));
1460 \sa QT_NO_CAST_FROM_ASCII
1465 Constructs a URL by parsing \a url. QUrl will automatically percent encode
1466 all characters that are not allowed in a URL and decode the percent-encoded
1467 sequences that represent a character that is allowed in a URL.
1469 Parses the \a url using the parser mode \a parsingMode. In TolerantMode
1470 (the default), QUrl will correct certain mistakes, notably the presence of
1471 a percent character ('%') not followed by two hexadecimal digits, and it
1472 will accept any character in any position. In StrictMode, encoding mistakes
1473 will not be tolerated and QUrl will also check that certain forbidden
1474 characters are not present in unencoded form. If an error is detected in
1475 StrictMode, isValid() will return false. The parsing mode DecodedMode is not
1476 permitted in this context.
1480 \snippet code/src_corelib_io_qurl.cpp 0
1482 To construct a URL from an encoded string, call fromEncoded():
1484 \snippet code/src_corelib_io_qurl.cpp 1
1486 \sa setUrl(), setEncodedUrl(), fromEncoded(), TolerantMode
1488 QUrl::QUrl(const QString &url, ParsingMode parsingMode) : d(0)
1490 setUrl(url, parsingMode);
1494 Constructs an empty QUrl object.
1501 Constructs a copy of \a other.
1503 QUrl::QUrl(const QUrl &other) : d(other.d)
1510 Destructor; called immediately before the object is deleted.
1514 if (d && !d->ref.deref())
1519 Returns true if the URL is non-empty and valid; otherwise returns false.
1521 The URL is run through a conformance test. Every part of the URL
1522 must conform to the standard encoding rules of the URI standard
1523 for the URL to be reported as valid.
1525 \snippet code/src_corelib_io_qurl.cpp 2
1527 bool QUrl::isValid() const
1529 if (isEmpty()) return false;
1530 return d->sectionHasError == 0;
1534 Returns true if the URL has no data; otherwise returns false.
1536 bool QUrl::isEmpty() const
1538 if (!d) return true;
1539 return d->isEmpty();
1543 Resets the content of the QUrl. After calling this function, the
1544 QUrl is equal to one that has been constructed with the default
1549 if (d && !d->ref.deref())
1555 Parses \a url and sets this object to that value. QUrl will automatically
1556 percent encode all characters that are not allowed in a URL and decode the
1557 percent-encoded sequences that represent a character that is allowed in a
1560 Parses the \a url using the parser mode \a parsingMode. In TolerantMode
1561 (the default), QUrl will correct certain mistakes, notably the presence of
1562 a percent character ('%') not followed by two hexadecimal digits, and it
1563 will accept any character in any position. In StrictMode, encoding mistakes
1564 will not be tolerated and QUrl will also check that certain forbidden
1565 characters are not present in unencoded form. If an error is detected in
1566 StrictMode, isValid() will return false. The parsing mode DecodedMode is
1567 not permitted in this context and will produce a run-time warning.
1569 \sa url(), toString()
1571 void QUrl::setUrl(const QString &url, ParsingMode parsingMode)
1573 if (parsingMode == DecodedMode) {
1574 qWarning("QUrl: QUrl::DecodedMode is not permitted when parsing a full URL");
1577 d->parse(url, parsingMode);
1583 Sets the scheme of the URL to \a scheme. As a scheme can only
1584 contain ASCII characters, no conversion or encoding is done on the
1585 input. It must also start with an ASCII letter.
1587 The scheme describes the type (or protocol) of the URL. It's
1588 represented by one or more ASCII characters at the start the URL,
1589 and is followed by a ':'. The following example shows a URL where
1590 the scheme is "ftp":
1592 \image qurl-authority2.png
1594 The scheme can also be empty, in which case the URL is interpreted
1597 \sa scheme(), isRelative()
1599 void QUrl::setScheme(const QString &scheme)
1602 if (scheme.isEmpty()) {
1603 // schemes are not allowed to be empty
1604 d->sectionIsPresent &= ~QUrlPrivate::Scheme;
1605 d->sectionHasError &= ~QUrlPrivate::Scheme;
1608 d->setScheme(scheme, scheme.length());
1613 Returns the scheme of the URL. If an empty string is returned,
1614 this means the scheme is undefined and the URL is then relative.
1616 The scheme can only contain US-ASCII letters or digits, which means it
1617 cannot contain any character that would otherwise require encoding.
1619 \sa setScheme(), isRelative()
1621 QString QUrl::scheme() const
1623 if (!d) return QString();
1629 Sets the authority of the URL to \a authority.
1631 The authority of a URL is the combination of user info, a host
1632 name and a port. All of these elements are optional; an empty
1633 authority is therefore valid.
1635 The user info and host are separated by a '@', and the host and
1636 port are separated by a ':'. If the user info is empty, the '@'
1637 must be omitted; although a stray ':' is permitted if the port is
1640 The following example shows a valid authority string:
1642 \image qurl-authority.png
1644 The \a authority data is interpreted according to \a mode: in StrictMode,
1645 any '%' characters must be followed by exactly two hexadecimal characters
1646 and some characters (including space) are not allowed in undecoded form. In
1647 TolerantMode (the default), all characters are accepted in undecoded form
1648 and the tolerant parser will correct stray '%' not followed by two hex
1649 characters. In DecodedMode, '%' stand for themselves and encoded characters
1650 are not possible. Because of that, in DecodedMode, it is not possible to
1651 use the delimiter characters as non-delimiters (e.g., a password containing
1654 \sa setUserInfo, setHost, setPort
1656 void QUrl::setAuthority(const QString &authority, ParsingMode mode)
1659 QString data = authority;
1660 if (mode == DecodedMode) {
1661 parseDecodedComponent(data);
1662 mode = TolerantMode;
1665 d->setAuthority(data, 0, data.length());
1666 if (authority.isNull()) {
1667 // QUrlPrivate::setAuthority cleared almost everything
1668 // but it leaves the Host bit set
1669 d->sectionIsPresent &= ~QUrlPrivate::Authority;
1674 Returns the authority of the URL if it is defined; otherwise
1675 an empty string is returned.
1677 The \a options argument controls how to format the authority portion of the
1678 URL. The value of QUrl::FullyDecoded should be avoided, since it may
1679 produce an ambiguous return value (for example, if the username contains a
1680 colon ':' or either the username or password contain an at-sign '@'). In
1681 all other cases, this function returns an unambiguous value, which may
1682 contain those characters still percent-encoded, plus some control
1683 sequences not representable in decoded form in QString.
1685 \sa setAuthority(), userInfo(), userName(), password(), host(), port()
1687 QString QUrl::authority(ComponentFormattingOptions options) const
1689 if (!d) return QString();
1692 d->appendAuthority(result, options, QUrlPrivate::Authority);
1697 Sets the user info of the URL to \a userInfo. The user info is an
1698 optional part of the authority of the URL, as described in
1701 The user info consists of a user name and optionally a password,
1702 separated by a ':'. If the password is empty, the colon must be
1703 omitted. The following example shows a valid user info string:
1705 \image qurl-authority3.png
1707 The \a userInfo data is interpreted according to \a mode: in StrictMode,
1708 any '%' characters must be followed by exactly two hexadecimal characters
1709 and some characters (including space) are not allowed in undecoded form. In
1710 TolerantMode (the default), all characters are accepted in undecoded form
1711 and the tolerant parser will correct stray '%' not followed by two hex
1712 characters. In DecodedMode, '%' stand for themselves and encoded characters
1713 are not possible. Because of that, in DecodedMode, it is not possible to
1714 use the ':' delimiter characters as non-delimiter in the user name.
1716 \sa userInfo(), setUserName(), setPassword(), setAuthority()
1718 void QUrl::setUserInfo(const QString &userInfo, ParsingMode mode)
1721 QString trimmed = userInfo.trimmed();
1722 if (mode == DecodedMode) {
1723 parseDecodedComponent(trimmed);
1724 mode = TolerantMode;
1727 d->setUserInfo(trimmed, 0, trimmed.length());
1728 if (userInfo.isNull()) {
1729 // QUrlPrivate::setUserInfo cleared almost everything
1730 // but it leaves the UserName bit set
1731 d->sectionIsPresent &= ~QUrlPrivate::UserInfo;
1736 Returns the user info of the URL, or an empty string if the user
1739 The \a options argument controls how to format the user info component. The
1740 value of QUrl::FullyDecoded should be avoided, since it may produce an
1741 ambiguous return value (for example, if the username contains a colon ':').
1742 In all other cases, this function returns an unambiguous value, which may
1743 contain that characters still percent-encoded, plus some control sequences
1744 not representable in decoded form in QString.
1746 \sa setUserInfo(), userName(), password(), authority()
1748 QString QUrl::userInfo(ComponentFormattingOptions options) const
1750 if (!d) return QString();
1753 d->appendUserInfo(result, options, QUrlPrivate::UserInfo);
1758 Sets the URL's user name to \a userName. The \a userName is part
1759 of the user info element in the authority of the URL, as described
1762 The \a userName data is interpreted according to \a mode: in StrictMode,
1763 any '%' characters must be followed by exactly two hexadecimal characters
1764 and some characters (including space) are not allowed in undecoded form. In
1765 TolerantMode (the default), all characters are accepted in undecoded form
1766 and the tolerant parser will correct stray '%' not followed by two hex
1767 characters. In DecodedMode, '%' stand for themselves and encoded characters
1770 QUrl::DecodedMode should be used when setting the user name from a data
1771 source which is not a URL, such as a password dialog shown to the user or
1772 with a user name obtained by calling userName() with the QUrl::FullyDecoded
1775 \sa userName(), setUserInfo()
1777 void QUrl::setUserName(const QString &userName, ParsingMode mode)
1781 QString data = userName;
1782 if (mode == DecodedMode) {
1783 parseDecodedComponent(data);
1784 mode = TolerantMode;
1788 d->setUserName(data, 0, data.length());
1789 if (userName.isNull())
1790 d->sectionIsPresent &= ~QUrlPrivate::UserName;
1794 Returns the user name of the URL if it is defined; otherwise
1795 an empty string is returned.
1797 The \a options argument controls how to format the user name component. All
1798 values produce an unambiguous result. With QUrl::FullyDecoded, all
1799 percent-encoded sequences are decoded; otherwise, the returned value may
1800 contain some percent-encoded sequences for some control sequences not
1801 representable in decoded form in QString.
1803 Note that QUrl::FullyDecoded may cause data loss if those non-representable
1804 sequences are present. It is recommended to use that value when the result
1805 will be used in a non-URL context, such as setting in QAuthenticator or
1806 negotiating a login.
1808 \sa setUserName(), userInfo()
1810 QString QUrl::userName(ComponentFormattingOptions options) const
1812 if (!d) return QString();
1815 d->appendUserName(result, options);
1820 Sets the URL's password to \a password. The \a password is part of
1821 the user info element in the authority of the URL, as described in
1824 The \a password data is interpreted according to \a mode: in StrictMode,
1825 any '%' characters must be followed by exactly two hexadecimal characters
1826 and some characters (including space) are not allowed in undecoded form. In
1827 TolerantMode, all characters are accepted in undecoded form and the
1828 tolerant parser will correct stray '%' not followed by two hex characters.
1829 In DecodedMode, '%' stand for themselves and encoded characters are not
1832 QUrl::DecodedMode should be used when setting the password from a data
1833 source which is not a URL, such as a password dialog shown to the user or
1834 with a password obtained by calling password() with the QUrl::FullyDecoded
1837 \sa password(), setUserInfo()
1839 void QUrl::setPassword(const QString &password, ParsingMode mode)
1843 QString data = password;
1844 if (mode == DecodedMode) {
1845 parseDecodedComponent(data);
1846 mode = TolerantMode;
1849 d->setPassword(data, 0, data.length());
1850 if (password.isNull())
1851 d->sectionIsPresent &= ~QUrlPrivate::Password;
1855 Returns the password of the URL if it is defined; otherwise
1856 an empty string is returned.
1858 The \a options argument controls how to format the user name component. All
1859 values produce an unambiguous result. With QUrl::FullyDecoded, all
1860 percent-encoded sequences are decoded; otherwise, the returned value may
1861 contain some percent-encoded sequences for some control sequences not
1862 representable in decoded form in QString.
1864 Note that QUrl::FullyDecoded may cause data loss if those non-representable
1865 sequences are present. It is recommended to use that value when the result
1866 will be used in a non-URL context, such as setting in QAuthenticator or
1867 negotiating a login.
1871 QString QUrl::password(ComponentFormattingOptions options) const
1873 if (!d) return QString();
1876 d->appendPassword(result, options);
1881 Sets the host of the URL to \a host. The host is part of the
1884 The \a host data is interpreted according to \a mode: in StrictMode,
1885 any '%' characters must be followed by exactly two hexadecimal characters
1886 and some characters (including space) are not allowed in undecoded form. In
1887 TolerantMode, all characters are accepted in undecoded form and the
1888 tolerant parser will correct stray '%' not followed by two hex characters.
1889 In DecodedMode, '%' stand for themselves and encoded characters are not
1892 Note that, in all cases, the result of the parsing must be a valid hostname
1893 according to STD 3 rules, as modified by the Internationalized Resource
1894 Identifiers specification (RFC 3987). Invalid hostnames are not permitted
1895 and will cause isValid() to become false.
1897 \sa host(), setAuthority()
1899 void QUrl::setHost(const QString &host, ParsingMode mode)
1903 QString data = host;
1904 if (mode == DecodedMode) {
1905 parseDecodedComponent(data);
1906 mode = TolerantMode;
1909 if (d->setHost(data, 0, data.length())) {
1911 d->sectionIsPresent &= ~QUrlPrivate::Host;
1912 } else if (!data.startsWith(QLatin1Char('['))) {
1913 // setHost failed, it might be IPv6 or IPvFuture in need of bracketing
1914 ushort oldCode = d->errorCode;
1915 ushort oldSupplement = d->errorSupplement;
1916 data.prepend(QLatin1Char('['));
1917 data.append(QLatin1Char(']'));
1918 if (!d->setHost(data, 0, data.length())) {
1919 // failed again: choose if this was an IPv6 error or not
1920 if (!data.contains(QLatin1Char(':'))) {
1921 d->errorCode = oldCode;
1922 d->errorSupplement = oldSupplement;
1929 Returns the host of the URL if it is defined; otherwise
1930 an empty string is returned.
1932 The \a options argument controls how the hostname will be formatted. The
1933 QUrl::EncodeUnicode option will cause this function to return the hostname
1934 in the ASCII-Compatible Encoding (ACE) form, which is suitable for use in
1935 channels that are not 8-bit clean or that require the legacy hostname (such
1936 as DNS requests or in HTTP request headers). If that flag is not present,
1937 this function returns the International Domain Name (IDN) in Unicode form,
1938 according to the list of permissible top-level domains (see
1941 All other flags are ignored. Host names cannot contain control or percent
1942 characters, so the returned value can be considered fully decoded.
1944 \sa setHost(), idnWhiteList(), setIdnWhiteList(), authority()
1946 QString QUrl::host(ComponentFormattingOptions options) const
1948 if (!d) return QString();
1951 d->appendHost(result, options);
1952 if (result.startsWith(QLatin1Char('[')))
1953 return result.mid(1, result.length() - 2);
1958 Sets the port of the URL to \a port. The port is part of the
1959 authority of the URL, as described in setAuthority().
1961 \a port must be between 0 and 65535 inclusive. Setting the
1962 port to -1 indicates that the port is unspecified.
1964 void QUrl::setPort(int port)
1968 if (port < -1 || port > 65535) {
1969 qWarning("QUrl::setPort: Out of range");
1971 d->sectionHasError |= QUrlPrivate::Port;
1972 d->errorCode = QUrlPrivate::InvalidPortError;
1974 d->sectionHasError &= ~QUrlPrivate::Port;
1983 Returns the port of the URL, or \a defaultPort if the port is
1988 \snippet code/src_corelib_io_qurl.cpp 3
1990 int QUrl::port(int defaultPort) const
1992 if (!d) return defaultPort;
1993 return d->port == -1 ? defaultPort : d->port;
1997 Sets the path of the URL to \a path. The path is the part of the
1998 URL that comes after the authority but before the query string.
2000 \image qurl-ftppath.png
2002 For non-hierarchical schemes, the path will be everything
2003 following the scheme declaration, as in the following example:
2005 \image qurl-mailtopath.png
2007 The \a path data is interpreted according to \a mode: in StrictMode,
2008 any '%' characters must be followed by exactly two hexadecimal characters
2009 and some characters (including space) are not allowed in undecoded form. In
2010 TolerantMode (the default), all characters are accepted in undecoded form and the
2011 tolerant parser will correct stray '%' not followed by two hex characters.
2012 In DecodedMode, '%' stand for themselves and encoded characters are not
2015 QUrl::DecodedMode should be used when setting the path from a data source
2016 which is not a URL, such as a dialog shown to the user or with a path
2017 obtained by calling path() with the QUrl::FullyDecoded formatting option.
2021 void QUrl::setPath(const QString &path, ParsingMode mode)
2025 QString data = path;
2026 if (mode == DecodedMode) {
2027 parseDecodedComponent(data);
2028 mode = TolerantMode;
2031 d->setPath(data, 0, data.length());
2033 // optimized out, since there is no path delimiter
2034 // if (path.isNull())
2035 // d->sectionIsPresent &= ~QUrlPrivate::Path;
2039 Returns the path of the URL.
2041 The \a options argument controls how to format the path component. All
2042 values produce an unambiguous result. With QUrl::FullyDecoded, all
2043 percent-encoded sequences are decoded; otherwise, the returned value may
2044 contain some percent-encoded sequences for some control sequences not
2045 representable in decoded form in QString.
2047 Note that QUrl::FullyDecoded may cause data loss if those non-representable
2048 sequences are present. It is recommended to use that value when the result
2049 will be used in a non-URL context, such as sending to an FTP server.
2053 QString QUrl::path(ComponentFormattingOptions options) const
2055 if (!d) return QString();
2058 d->appendPath(result, options, QUrlPrivate::Path);
2065 Returns true if this URL contains a Query (i.e., if ? was seen on it).
2067 \sa setQuery(), query(), hasFragment()
2069 bool QUrl::hasQuery() const
2071 if (!d) return false;
2072 return d->hasQuery();
2076 Sets the query string of the URL to \a query.
2078 This function is useful if you need to pass a query string that
2079 does not fit into the key-value pattern, or that uses a different
2080 scheme for encoding special characters than what is suggested by
2083 Passing a value of QString() to \a query (a null QString) unsets
2084 the query completely. However, passing a value of QString("")
2085 will set the query to an empty value, as if the original URL
2088 The \a query data is interpreted according to \a mode: in StrictMode,
2089 any '%' characters must be followed by exactly two hexadecimal characters
2090 and some characters (including space) are not allowed in undecoded form. In
2091 TolerantMode, all characters are accepted in undecoded form and the
2092 tolerant parser will correct stray '%' not followed by two hex characters.
2093 In DecodedMode, '%' stand for themselves and encoded characters are not
2096 Query strings often contain percent-encoded sequences, so use of
2097 DecodedMode is discouraged. One special sequence to be aware of is that of
2098 the plus character ('+'). QUrl does not convert spaces to plus characters,
2099 even though HTML forms posted by web browsers do. In order to represent an
2100 actual plus character in a query, the sequence "%2B" is usually used. This
2101 function will leave "%2B" sequences untouched in TolerantMode or
2104 \sa query(), hasQuery()
2106 void QUrl::setQuery(const QString &query, ParsingMode mode)
2110 QString data = query;
2111 if (mode == DecodedMode) {
2112 parseDecodedComponent(data);
2113 mode = TolerantMode;
2116 d->setQuery(data, 0, data.length());
2118 d->sectionIsPresent &= ~QUrlPrivate::Query;
2124 Sets the query string of the URL to \a query.
2126 This function reconstructs the query string from the QUrlQuery object and
2127 sets on this QUrl object. This function does not have parsing parameters
2128 because the QUrlQuery contains data that is already parsed.
2130 \sa query(), hasQuery()
2132 void QUrl::setQuery(const QUrlQuery &query)
2136 // we know the data is in the right format
2137 d->query = query.toString();
2138 if (query.isEmpty())
2139 d->sectionIsPresent &= ~QUrlPrivate::Query;
2141 d->sectionIsPresent |= QUrlPrivate::Query;
2145 Returns the query string of the URL if there's a query string, or an empty
2146 result if not. To determine if the parsed URL contained a query string, use
2149 The \a options argument controls how to format the query component. All
2150 values produce an unambiguous result. With QUrl::FullyDecoded, all
2151 percent-encoded sequences are decoded; otherwise, the returned value may
2152 contain some percent-encoded sequences for some control sequences not
2153 representable in decoded form in QString.
2155 Note that use of QUrl::FullyDecoded in queries is discouraged, as queries
2156 often contain data that is supposed to remain percent-encoded, including
2157 the use of the "%2B" sequence to represent a plus character ('+').
2159 \sa setQuery(), hasQuery()
2161 QString QUrl::query(ComponentFormattingOptions options) const
2163 if (!d) return QString();
2166 d->appendQuery(result, options, QUrlPrivate::Query);
2167 if (d->hasQuery() && result.isNull())
2173 Sets the fragment of the URL to \a fragment. The fragment is the
2174 last part of the URL, represented by a '#' followed by a string of
2175 characters. It is typically used in HTTP for referring to a
2176 certain link or point on a page:
2178 \image qurl-fragment.png
2180 The fragment is sometimes also referred to as the URL "reference".
2182 Passing an argument of QString() (a null QString) will unset the fragment.
2183 Passing an argument of QString("") (an empty but not null QString)
2184 will set the fragment to an empty string (as if the original URL
2187 The \a fragment data is interpreted according to \a mode: in StrictMode,
2188 any '%' characters must be followed by exactly two hexadecimal characters
2189 and some characters (including space) are not allowed in undecoded form. In
2190 TolerantMode, all characters are accepted in undecoded form and the
2191 tolerant parser will correct stray '%' not followed by two hex characters.
2192 In DecodedMode, '%' stand for themselves and encoded characters are not
2195 QUrl::DecodedMode should be used when setting the fragment from a data
2196 source which is not a URL or with a fragment obtained by calling
2197 fragment() with the QUrl::FullyDecoded formatting option.
2199 \sa fragment(), hasFragment()
2201 void QUrl::setFragment(const QString &fragment, ParsingMode mode)
2205 QString data = fragment;
2206 if (mode == DecodedMode) {
2207 parseDecodedComponent(data);
2208 mode = TolerantMode;
2211 d->setFragment(data, 0, data.length());
2212 if (fragment.isNull())
2213 d->sectionIsPresent &= ~QUrlPrivate::Fragment;
2217 Returns the fragment of the URL. To determine if the parsed URL contained a
2218 fragment, use hasFragment().
2220 The \a options argument controls how to format the fragment component. All
2221 values produce an unambiguous result. With QUrl::FullyDecoded, all
2222 percent-encoded sequences are decoded; otherwise, the returned value may
2223 contain some percent-encoded sequences for some control sequences not
2224 representable in decoded form in QString.
2226 Note that QUrl::FullyDecoded may cause data loss if those non-representable
2227 sequences are present. It is recommended to use that value when the result
2228 will be used in a non-URL context.
2230 \sa setFragment(), hasFragment()
2232 QString QUrl::fragment(ComponentFormattingOptions options) const
2234 if (!d) return QString();
2237 d->appendFragment(result, options);
2238 if (d->hasFragment() && result.isNull())
2246 Returns true if this URL contains a fragment (i.e., if # was seen on it).
2248 \sa fragment(), setFragment()
2250 bool QUrl::hasFragment() const
2252 if (!d) return false;
2253 return d->hasFragment();
2259 Returns the TLD (Top-Level Domain) of the URL, (e.g. .co.uk, .net).
2260 Note that the return value is prefixed with a '.' unless the
2261 URL does not contain a valid TLD, in which case the function returns
2264 QString QUrl::topLevelDomain(ComponentFormattingOptions options) const
2266 QString tld = qTopLevelDomain(host());
2267 if (options & EncodeUnicode) {
2268 return qt_ACE_do(tld, ToAceOnly);
2274 Returns the result of the merge of this URL with \a relative. This
2275 URL is used as a base to convert \a relative to an absolute URL.
2277 If \a relative is not a relative URL, this function will return \a
2278 relative directly. Otherwise, the paths of the two URLs are
2279 merged, and the new URL returned has the scheme and authority of
2280 the base URL, but with the merged path, as in the following
2283 \snippet code/src_corelib_io_qurl.cpp 5
2285 Calling resolved() with ".." returns a QUrl whose directory is
2286 one level higher than the original. Similarly, calling resolved()
2287 with "../.." removes two levels from the path. If \a relative is
2288 "/", the path becomes "/".
2292 QUrl QUrl::resolved(const QUrl &relative) const
2294 if (!d) return relative;
2295 if (!relative.d) return *this;
2298 // be non strict and allow scheme in relative url
2299 if (!relative.d->scheme.isEmpty() && relative.d->scheme != d->scheme) {
2302 if (relative.d->hasAuthority()) {
2305 t.d = new QUrlPrivate;
2307 // copy the authority
2308 t.d->userName = d->userName;
2309 t.d->password = d->password;
2310 t.d->host = d->host;
2311 t.d->port = d->port;
2312 t.d->sectionIsPresent = d->sectionIsPresent & QUrlPrivate::Authority;
2314 if (relative.d->path.isEmpty()) {
2315 t.d->path = d->path;
2316 if (relative.d->hasQuery()) {
2317 t.d->query = relative.d->query;
2318 t.d->sectionIsPresent |= QUrlPrivate::Query;
2319 } else if (d->hasQuery()) {
2320 t.d->query = d->query;
2321 t.d->sectionIsPresent |= QUrlPrivate::Query;
2324 t.d->path = relative.d->path.startsWith(QLatin1Char('/'))
2326 : d->mergePaths(relative.d->path);
2327 if (relative.d->hasQuery()) {
2328 t.d->query = relative.d->query;
2329 t.d->sectionIsPresent |= QUrlPrivate::Query;
2333 t.d->scheme = d->scheme;
2335 t.d->sectionIsPresent |= QUrlPrivate::Scheme;
2337 t.d->sectionIsPresent &= ~QUrlPrivate::Scheme;
2339 t.d->fragment = relative.d->fragment;
2340 if (relative.d->hasFragment())
2341 t.d->sectionIsPresent |= QUrlPrivate::Fragment;
2343 t.d->sectionIsPresent &= ~QUrlPrivate::Fragment;
2345 removeDotsFromPath(&t.d->path);
2347 #if defined(QURL_DEBUG)
2348 qDebug("QUrl(\"%s\").resolved(\"%s\") = \"%s\"",
2350 qPrintable(relative.url()),
2351 qPrintable(t.url()));
2357 Returns true if the URL is relative; otherwise returns false. A URL is
2358 relative reference if its scheme is undefined; this function is therefore
2359 equivalent to calling scheme().isEmpty().
2361 Relative references are defined in RFC 3986 section 4.2.
2363 bool QUrl::isRelative() const
2365 if (!d) return true;
2366 return !d->hasScheme();
2370 Returns a string representation of the URL. The output can be customized by
2371 passing flags with \a options. The option QUrl::FullyDecoded is not
2372 permitted in this function since it would generate ambiguous data.
2374 The resulting QString can be passed back to a QUrl later on.
2376 Synonym for toString(options).
2378 \sa FormattingOptions, toEncoded(), toString()
2380 QString QUrl::url(FormattingOptions options) const
2382 return toString(options);
2386 Returns a string representation of the URL. The output can be customized by
2387 passing flags with \a options. The option QUrl::FullyDecoded is not
2388 permitted in this function since it would generate ambiguous data.
2390 The default formatting option is \l{QUrl::FormattingOptions}{PrettyDecoded}.
2392 \sa FormattingOptions, url(), setUrl()
2394 QString QUrl::toString(FormattingOptions options) const
2396 if (!d) return QString();
2397 if (options == QUrl::FullyDecoded) {
2398 qWarning("QUrl: QUrl::FullyDecoded is not permitted when reconstructing the full URL");
2399 options = QUrl::PrettyDecoded;
2402 // return just the path if:
2403 // - QUrl::PreferLocalFile is passed
2404 // - QUrl::RemovePath isn't passed (rather stupid if the user did...)
2405 // - there's no query or fragment to return
2406 // that is, either they aren't present, or we're removing them
2407 // - it's a local file
2408 // (test done last since it's the most expensive)
2409 if (options.testFlag(QUrl::PreferLocalFile) && !options.testFlag(QUrl::RemovePath)
2410 && (!d->hasQuery() || options.testFlag(QUrl::RemoveQuery))
2411 && (!d->hasFragment() || options.testFlag(QUrl::RemoveFragment))
2413 return path(options);
2418 // for the full URL, we consider that the reserved characters are prettier if encoded
2419 if (options & DecodeReserved)
2420 options &= ~EncodeReserved;
2422 options |= EncodeReserved;
2424 if (!(options & QUrl::RemoveScheme) && d->hasScheme())
2425 url += d->scheme + QLatin1Char(':');
2427 bool pathIsAbsolute = d->path.startsWith(QLatin1Char('/'));
2428 if (!((options & QUrl::RemoveAuthority) == QUrl::RemoveAuthority) && d->hasAuthority()) {
2429 url += QLatin1String("//");
2430 d->appendAuthority(url, options, QUrlPrivate::FullUrl);
2431 } else if (isLocalFile() && pathIsAbsolute) {
2432 url += QLatin1String("//");
2435 if (!(options & QUrl::RemovePath)) {
2436 // check if we need to insert a slash
2437 if (!pathIsAbsolute && !d->path.isEmpty() && !url.isEmpty() && !url.endsWith(QLatin1Char(':')))
2438 url += QLatin1Char('/');
2440 d->appendPath(url, options, QUrlPrivate::FullUrl);
2441 // check if we need to remove trailing slashes
2442 if ((options & StripTrailingSlash) && !d->path.isEmpty() && d->path != QLatin1String("/") && url.endsWith(QLatin1Char('/')))
2446 if (!(options & QUrl::RemoveQuery) && d->hasQuery()) {
2447 url += QLatin1Char('?');
2448 d->appendQuery(url, options, QUrlPrivate::FullUrl);
2450 if (!(options & QUrl::RemoveFragment) && d->hasFragment()) {
2451 url += QLatin1Char('#');
2452 d->appendFragment(url, options);
2461 Returns a human-displayable string representation of the URL.
2462 The output can be customized by passing flags with \a options.
2463 The option RemovePassword is always enabled, since passwords
2464 should never be shown back to users.
2466 With the default options, the resulting QString can be passed back
2467 to a QUrl later on, but any password that was present initially will
2470 \sa FormattingOptions, toEncoded(), toString()
2473 QString QUrl::toDisplayString(FormattingOptions options) const
2475 return toString(options | RemovePassword);
2479 Returns the encoded representation of the URL if it's valid;
2480 otherwise an empty QByteArray is returned. The output can be
2481 customized by passing flags with \a options.
2483 The user info, path and fragment are all converted to UTF-8, and
2484 all non-ASCII characters are then percent encoded. The host name
2485 is encoded using Punycode.
2487 QByteArray QUrl::toEncoded(FormattingOptions options) const
2489 options &= ~(FullyDecoded | FullyEncoded);
2490 QString stringForm = toString(options | FullyEncoded);
2491 return stringForm.toLatin1();
2495 \fn QUrl QUrl::fromEncoded(const QByteArray &input, ParsingMode parsingMode)
2497 Parses \a input and returns the corresponding QUrl. \a input is
2498 assumed to be in encoded form, containing only ASCII characters.
2500 Parses the URL using \a parsingMode. See setUrl() for more information on
2501 this parameter. QUrl::DecodedMode is not permitted in this context.
2503 \sa toEncoded(), setUrl()
2505 QUrl QUrl::fromEncoded(const QByteArray &input, ParsingMode mode)
2507 return QUrl(QString::fromUtf8(input.constData(), input.size()), mode);
2511 Returns a decoded copy of \a input. \a input is first decoded from
2512 percent encoding, then converted from UTF-8 to unicode.
2514 QString QUrl::fromPercentEncoding(const QByteArray &input)
2516 return QString::fromUtf8(QByteArray::fromPercentEncoding(input));
2520 Returns an encoded copy of \a input. \a input is first converted
2521 to UTF-8, and all ASCII-characters that are not in the unreserved group
2522 are percent encoded. To prevent characters from being percent encoded
2523 pass them to \a exclude. To force characters to be percent encoded pass
2526 Unreserved is defined as:
2527 ALPHA / DIGIT / "-" / "." / "_" / "~"
2529 \snippet code/src_corelib_io_qurl.cpp 6
2531 QByteArray QUrl::toPercentEncoding(const QString &input, const QByteArray &exclude, const QByteArray &include)
2533 return input.toUtf8().toPercentEncoding(exclude, include);
2537 \fn QByteArray QUrl::toPunycode(const QString &uc)
2539 Returns a \a uc in Punycode encoding.
2541 Punycode is a Unicode encoding used for internationalized domain
2542 names, as defined in RFC3492. If you want to convert a domain name from
2543 Unicode to its ASCII-compatible representation, use toAce().
2547 \fn QString QUrl::fromPunycode(const QByteArray &pc)
2549 Returns the Punycode decoded representation of \a pc.
2551 Punycode is a Unicode encoding used for internationalized domain
2552 names, as defined in RFC3492. If you want to convert a domain from
2553 its ASCII-compatible encoding to the Unicode representation, use
2560 Returns the Unicode form of the given domain name
2561 \a domain, which is encoded in the ASCII Compatible Encoding (ACE).
2562 The result of this function is considered equivalent to \a domain.
2564 If the value in \a domain cannot be encoded, it will be converted
2565 to QString and returned.
2567 The ASCII Compatible Encoding (ACE) is defined by RFC 3490, RFC 3491
2568 and RFC 3492. It is part of the Internationalizing Domain Names in
2569 Applications (IDNA) specification, which allows for domain names
2570 (like \c "example.com") to be written using international
2573 QString QUrl::fromAce(const QByteArray &domain)
2575 return qt_ACE_do(QString::fromLatin1(domain), NormalizeAce);
2581 Returns the ASCII Compatible Encoding of the given domain name \a domain.
2582 The result of this function is considered equivalent to \a domain.
2584 The ASCII-Compatible Encoding (ACE) is defined by RFC 3490, RFC 3491
2585 and RFC 3492. It is part of the Internationalizing Domain Names in
2586 Applications (IDNA) specification, which allows for domain names
2587 (like \c "example.com") to be written using international
2590 This function return an empty QByteArra if \a domain is not a valid
2591 hostname. Note, in particular, that IPv6 literals are not valid domain
2594 QByteArray QUrl::toAce(const QString &domain)
2596 QString result = qt_ACE_do(domain, ToAceOnly);
2597 return result.toLatin1();
2603 Returns true if this URL is "less than" the given \a url. This
2604 provides a means of ordering URLs.
2606 bool QUrl::operator <(const QUrl &url) const
2609 bool thisIsEmpty = !d || d->isEmpty();
2610 bool thatIsEmpty = !url.d || url.d->isEmpty();
2612 // sort an empty URL first
2613 return thisIsEmpty && !thatIsEmpty;
2617 cmp = d->scheme.compare(url.d->scheme);
2621 cmp = d->userName.compare(url.d->userName);
2625 cmp = d->password.compare(url.d->password);
2629 cmp = d->host.compare(url.d->host);
2633 if (d->port != url.d->port)
2634 return d->port < url.d->port;
2636 cmp = d->path.compare(url.d->path);
2640 if (d->hasQuery() != url.d->hasQuery())
2641 return url.d->hasQuery();
2643 cmp = d->query.compare(url.d->query);
2647 if (d->hasFragment() != url.d->hasFragment())
2648 return url.d->hasFragment();
2650 cmp = d->fragment.compare(url.d->fragment);
2655 Returns true if this URL and the given \a url are equal;
2656 otherwise returns false.
2658 bool QUrl::operator ==(const QUrl &url) const
2663 return url.d->isEmpty();
2665 return d->isEmpty();
2667 // Compare which sections are present, but ignore Host
2668 // which is set by parsing but not by construction, when empty.
2669 const int mask = QUrlPrivate::FullUrl & ~QUrlPrivate::Host;
2670 return (d->sectionIsPresent & mask) == (url.d->sectionIsPresent & mask) &&
2671 d->scheme == url.d->scheme &&
2672 d->userName == url.d->userName &&
2673 d->password == url.d->password &&
2674 d->host == url.d->host &&
2675 d->port == url.d->port &&
2676 d->path == url.d->path &&
2677 d->query == url.d->query &&
2678 d->fragment == url.d->fragment;
2682 Returns true if this URL and the given \a url are not equal;
2683 otherwise returns false.
2685 bool QUrl::operator !=(const QUrl &url) const
2687 return !(*this == url);
2691 Assigns the specified \a url to this object.
2693 QUrl &QUrl::operator =(const QUrl &url)
2702 qAtomicAssign(d, url.d);
2710 Assigns the specified \a url to this object.
2712 QUrl &QUrl::operator =(const QString &url)
2714 if (url.isEmpty()) {
2718 d->parse(url, TolerantMode);
2724 \fn void QUrl::swap(QUrl &other)
2727 Swaps URL \a other with this URL. This operation is very
2728 fast and never fails.
2738 d = new QUrlPrivate;
2746 bool QUrl::isDetached() const
2748 return !d || d->ref.load() == 1;
2753 Returns a QUrl representation of \a localFile, interpreted as a local
2754 file. This function accepts paths separated by slashes as well as the
2755 native separator for this platform.
2757 This function also accepts paths with a doubled leading slash (or
2758 backslash) to indicate a remote file, as in
2759 "//servername/path/to/file.txt". Note that only certain platforms can
2760 actually open this file using QFile::open().
2762 \sa toLocalFile(), isLocalFile(), QDir::toNativeSeparators()
2764 QUrl QUrl::fromLocalFile(const QString &localFile)
2767 url.setScheme(fileScheme());
2768 QString deslashified = QDir::fromNativeSeparators(localFile);
2770 // magic for drives on windows
2771 if (deslashified.length() > 1 && deslashified.at(1) == QLatin1Char(':') && deslashified.at(0) != QLatin1Char('/')) {
2772 deslashified.prepend(QLatin1Char('/'));
2773 } else if (deslashified.startsWith(QLatin1String("//"))) {
2774 // magic for shared drive on windows
2775 int indexOfPath = deslashified.indexOf(QLatin1Char('/'), 2);
2776 url.setHost(deslashified.mid(2, indexOfPath - 2));
2777 if (indexOfPath > 2)
2778 deslashified = deslashified.right(deslashified.length() - indexOfPath);
2780 deslashified.clear();
2783 url.setPath(deslashified, DecodedMode);
2788 Returns the path of this URL formatted as a local file path. The path
2789 returned will use forward slashes, even if it was originally created
2790 from one with backslashes.
2792 If this URL contains a non-empty hostname, it will be encoded in the
2793 returned value in the form found on SMB networks (for example,
2794 "//servername/path/to/file.txt").
2796 Note: if the path component of this URL contains a non-UTF-8 binary
2797 sequence (such as %80), the behaviour of this function is undefined.
2799 \sa fromLocalFile(), isLocalFile()
2801 QString QUrl::toLocalFile() const
2803 // the call to isLocalFile() also ensures that we're parsed
2808 QString ourPath = path(QUrl::FullyDecoded);
2810 // magic for shared drive on windows
2811 if (!d->host.isEmpty()) {
2812 tmp = QStringLiteral("//") + host() + (ourPath.length() > 0 && ourPath.at(0) != QLatin1Char('/')
2813 ? QLatin1Char('/') + ourPath : ourPath);
2817 // magic for drives on windows
2818 if (ourPath.length() > 2 && ourPath.at(0) == QLatin1Char('/') && ourPath.at(2) == QLatin1Char(':'))
2827 Returns true if this URL is pointing to a local file path. A URL is a
2828 local file path if the scheme is "file".
2830 Note that this function considers URLs with hostnames to be local file
2831 paths, even if the eventual file path cannot be opened with
2834 \sa fromLocalFile(), toLocalFile()
2836 bool QUrl::isLocalFile() const
2838 if (!d) return false;
2840 if (d->scheme != fileScheme())
2841 return false; // not file
2846 Returns true if this URL is a parent of \a childUrl. \a childUrl is a child
2847 of this URL if the two URLs share the same scheme and authority,
2848 and this URL's path is a parent of the path of \a childUrl.
2850 bool QUrl::isParentOf(const QUrl &childUrl) const
2852 QString childPath = childUrl.path();
2855 return ((childUrl.scheme().isEmpty())
2856 && (childUrl.authority().isEmpty())
2857 && childPath.length() > 0 && childPath.at(0) == QLatin1Char('/'));
2859 QString ourPath = path();
2861 return ((childUrl.scheme().isEmpty() || d->scheme == childUrl.scheme())
2862 && (childUrl.authority().isEmpty() || authority() == childUrl.authority())
2863 && childPath.startsWith(ourPath)
2864 && ((ourPath.endsWith(QLatin1Char('/')) && childPath.length() > ourPath.length())
2865 || (!ourPath.endsWith(QLatin1Char('/'))
2866 && childPath.length() > ourPath.length() && childPath.at(ourPath.length()) == QLatin1Char('/'))));
2870 #ifndef QT_NO_DATASTREAM
2873 Writes url \a url to the stream \a out and returns a reference
2876 \sa{Serializing Qt Data Types}{Format of the QDataStream operators}
2878 QDataStream &operator<<(QDataStream &out, const QUrl &url)
2882 u = url.toEncoded();
2889 Reads a url into \a url from the stream \a in and returns a
2890 reference to the stream.
2892 \sa{Serializing Qt Data Types}{Format of the QDataStream operators}
2894 QDataStream &operator>>(QDataStream &in, QUrl &url)
2898 url.setUrl(QString::fromLatin1(u));
2901 #endif // QT_NO_DATASTREAM
2903 #ifndef QT_NO_DEBUG_STREAM
2904 QDebug operator<<(QDebug d, const QUrl &url)
2906 d.maybeSpace() << "QUrl(" << url.toDisplayString() << ')';
2914 Returns a text string that explains why an URL is invalid in the case being;
2915 otherwise returns an empty string.
2917 QString QUrl::errorString() const
2922 if (d->sectionHasError == 0)
2925 // check if the error code matches a section with error
2926 if ((d->sectionHasError & (d->errorCode >> 8)) == 0)
2929 QChar c = d->errorSupplement;
2930 switch (QUrlPrivate::ErrorCode(d->errorCode)) {
2931 case QUrlPrivate::NoError:
2934 case QUrlPrivate::InvalidSchemeError: {
2935 QString msg = QStringLiteral("Invalid scheme (character '%1' not permitted)");
2938 case QUrlPrivate::SchemeEmptyError:
2939 return QStringLiteral("Empty scheme");
2941 case QUrlPrivate::InvalidUserNameError:
2942 return QString(QStringLiteral("Invalid user name (character '%1' not permitted)"))
2945 case QUrlPrivate::InvalidPasswordError:
2946 return QString(QStringLiteral("Invalid password (character '%1' not permitted)"))
2949 case QUrlPrivate::InvalidRegNameError:
2950 if (d->errorSupplement)
2951 return QString(QStringLiteral("Invalid hostname (character '%1' not permitted)"))
2954 return QStringLiteral("Hostname contains invalid characters");
2955 case QUrlPrivate::InvalidIPv4AddressError:
2956 return QString(); // doesn't happen yet
2957 case QUrlPrivate::InvalidIPv6AddressError:
2958 return QStringLiteral("Invalid IPv6 address");
2959 case QUrlPrivate::InvalidIPvFutureError:
2960 return QStringLiteral("Invalid IPvFuture address");
2961 case QUrlPrivate::HostMissingEndBracket:
2962 return QStringLiteral("Expected ']' to match '[' in hostname");
2964 case QUrlPrivate::InvalidPortError:
2965 case QUrlPrivate::PortEmptyError:
2966 return QStringLiteral("Invalid port or port number out of range");
2968 case QUrlPrivate::InvalidPathError:
2969 return QString(QStringLiteral("Invalid path (character '%1' not permitted)"))
2971 case QUrlPrivate::PathContainsColonBeforeSlash:
2972 return QStringLiteral("Path component contains ':' before any '/'");
2974 case QUrlPrivate::InvalidQueryError:
2975 return QString(QStringLiteral("Invalid query (character '%1' not permitted)"))
2978 case QUrlPrivate::InvalidFragmentError:
2979 return QString(QStringLiteral("Invalid fragment (character '%1' not permitted)"))
2982 return QStringLiteral("<unknown error>");
2986 \typedef QUrl::DataPtr
2991 \fn DataPtr &QUrl::data_ptr()
2995 /*! \fn uint qHash(const QUrl &url, uint seed = 0)
2999 Returns the hash value for the \a url.
3001 uint qHash(const QUrl &url, uint seed)
3004 return qHash(-1, seed); // the hash of an unset port (-1)
3006 return qHash(url.d->scheme) ^
3007 qHash(url.d->userName) ^
3008 qHash(url.d->password) ^
3009 qHash(url.d->host) ^
3010 qHash(url.d->port, seed) ^
3011 qHash(url.d->path) ^
3012 qHash(url.d->query) ^
3013 qHash(url.d->fragment);
3016 static QUrl adjustFtpPath(QUrl url)
3018 if (url.scheme() == ftpScheme()) {
3019 QString path = url.path();
3020 if (path.startsWith(QLatin1String("//")))
3021 url.setPath(QLatin1String("/%2F") + path.midRef(2));
3027 // The following code has the following copyright:
3029 Copyright (C) Research In Motion Limited 2009. All rights reserved.
3031 Redistribution and use in source and binary forms, with or without
3032 modification, are permitted provided that the following conditions are met:
3033 * Redistributions of source code must retain the above copyright
3034 notice, this list of conditions and the following disclaimer.
3035 * Redistributions in binary form must reproduce the above copyright
3036 notice, this list of conditions and the following disclaimer in the
3037 documentation and/or other materials provided with the distribution.
3038 * Neither the name of Research In Motion Limited nor the
3039 names of its contributors may be used to endorse or promote products
3040 derived from this software without specific prior written permission.
3042 THIS SOFTWARE IS PROVIDED BY Research In Motion Limited ''AS IS'' AND ANY
3043 EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
3044 WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
3045 DISCLAIMED. IN NO EVENT SHALL Research In Motion Limited BE LIABLE FOR ANY
3046 DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
3047 (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
3048 LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
3049 ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
3050 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
3051 SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
3057 Returns a valid URL from a user supplied \a userInput string if one can be
3058 deducted. In the case that is not possible, an invalid QUrl() is returned.
3062 Most applications that can browse the web, allow the user to input a URL
3063 in the form of a plain string. This string can be manually typed into
3064 a location bar, obtained from the clipboard, or passed in via command
3067 When the string is not already a valid URL, a best guess is performed,
3068 making various web related assumptions.
3070 In the case the string corresponds to a valid file path on the system,
3071 a file:// URL is constructed, using QUrl::fromLocalFile().
3073 If that is not the case, an attempt is made to turn the string into a
3074 http:// or ftp:// URL. The latter in the case the string starts with
3075 'ftp'. The result is then passed through QUrl's tolerant parser, and
3076 in the case or success, a valid QUrl is returned, or else a QUrl().
3081 \li qt.nokia.com becomes http://qt.nokia.com
3082 \li ftp.qt.nokia.com becomes ftp://ftp.qt.nokia.com
3083 \li hostname becomes http://hostname
3084 \li /home/user/test.html becomes file:///home/user/test.html
3087 QUrl QUrl::fromUserInput(const QString &userInput)
3089 QString trimmedString = userInput.trimmed();
3091 // Check first for files, since on Windows drive letters can be interpretted as schemes
3092 if (QDir::isAbsolutePath(trimmedString))
3093 return QUrl::fromLocalFile(trimmedString);
3095 QUrl url = QUrl(trimmedString, QUrl::TolerantMode);
3096 QUrl urlPrepended = QUrl(QStringLiteral("http://") + trimmedString, QUrl::TolerantMode);
3098 // Check the most common case of a valid url with scheme and host
3099 // We check if the port would be valid by adding the scheme to handle the case host:port
3100 // where the host would be interpretted as the scheme
3102 && !url.scheme().isEmpty()
3103 && (!url.host().isEmpty() || !url.path().isEmpty())
3104 && urlPrepended.port() == -1)
3105 return adjustFtpPath(url);
3107 // Else, try the prepended one and adjust the scheme from the host name
3108 if (urlPrepended.isValid() && (!urlPrepended.host().isEmpty() || !urlPrepended.path().isEmpty()))
3110 int dotIndex = trimmedString.indexOf(QLatin1Char('.'));
3111 const QString hostscheme = trimmedString.left(dotIndex).toLower();
3112 if (hostscheme == ftpScheme())
3113 urlPrepended.setScheme(ftpScheme());
3114 return adjustFtpPath(urlPrepended);