How to parse XML with Expat in C/C++
The most used in the world, such as: CMake, Godot, Firefox, LibreOffice and others.
In today’s cpp::daily we will talk about one of the most used libraries in the world.
There are several libraries to parse XML and we will already post a list of the best ones. Each with its pros and cons, but Expat has a lot of differentials, starting with the fact that it is nominated by the W3C.
Of course, Expat is the most difficult to implement, but the guarantee of the result is 100%. Besides being the most used of all, projects such as: AbiWord, Android Studio, Apache OpenOffice, Audacity, aria2, CMake, D-Bus, Electron, Elinks, Firefox, Git, Godot, LibreOffice and many others use it.
Expat can be implemented by several programming languages, such as: Python, PHP, Perl and others.
It was written in C and can be intimidating due to the many types of handlers and options you can define. But you only need to learn four functions to do 80% of what you want to do with them:
XML_ParserCreate- To create a new analyzer object.
XML_SetElementHandler- To define handlers for start and end tags.
XML_SetCharacterDataHandler- To define the handler for text.
XML_Parse- To pass a buffer full of documents to the analyzer
First of all you need to have it installed in your include. It is in all distro repositories, so just use your distro’s package manager, examples:
To understand, let’s show an example of a code and for more details, I suggest reading the documentation, which is very concise, because the only thing I missed in the documentation was a simple example to understand in general lines.
So, I created this basic, but already functional example. For didactic purposes the code is not object oriented so that you can use it in both C and C++.
For this we are going to use this XML example:
And to read this XML we will use this code:
To compile, run the command:
Now just run
./parser and notice that the tags are displayed by the function
start(void * userData, const char * name, const char * args ) more precisely by the
And the content of the tags by the
value(void * userData, const char * val, int len) more precisely by the variable
cpy. The end function has no content, as it will be used in the XML_SetElementHandler, as stated above.
Try to modify, display in tables and others as an exercise practice, ok ?!