Quantcast
Channel: Programming – Mr Pfisters Random Waffle
Viewing all articles
Browse latest Browse all 15

XML Stringbuilder Gotcha

$
0
0

Ran into another annoying gotcha earlier today, it has to do with the encoding format that XML uses when writing out using various methods.

If you look at the following code segment for an XmlWriterSetting class, you will straight away notice the encoding property being assigned, and thus be thinking ‘well it must output as UTF-8, I set it as that’

XmlWriterSettings ws = new XmlWriterSettings();
ws.Encoding = System.Text.Encoding.UTF8;

So you would expect that any XML document you export using these settings would have the following header:

<?xml version="1.0" encoding="UTF-8" ?>

However if you use a StringBuilder as the output mechanism for an XmlWriter, you will always get the following header, no matter what encoding setting you put in the XmlWriterSettings class.

<?xml version="1.0" encoding="UTF-16" ?>

Now if anything, I would have thought the physically encoding would have remained as UTF-16 but the Xml header incorrectly state UTF-8 based on the supplied settings. Its actually quite nice in a way that it detects the encoding property of the output stream and uses this to supersede that set in the origin settings.

All of this is because StringBuilder relies on .Net strings, which are always UTF-16, thus it will always output Xml via a StringBuilder as UTF-16 …Gotcha

If you need to write out as UTF-8, or any other encoding, you will have to use a different output mechanism rather than StringBuilder. My usual approach is writing to a MemoryStream as this allows much more flexiblity.

XmlWriterSettings ws = new XmlWriterSettings();
ws.Encoding = System.Text.Encoding.UTF8;

MemoryStream xmlCompletedCache = new MemoryStream();
using (XmlWriter xmlWriter = XmlWriter.Create(xmlCompletedCache, ws))
{
    // Write xml here
}

Viewing all articles
Browse latest Browse all 15

Trending Articles