In today’s cpp::daily we will talk about one of the most used libraries in the world.
There are several libraries to parse XML and we will already post a list of the best ones. Each with its pros and cons, but Expat has a lot of differentials, starting with the fact that it is nominated by the W3C.
Of course, Expat is the most difficult to implement, but the guarantee of the result is 100%. Besides being the most used of all, projects such as: AbiWord, Android Studio, Apache OpenOffice, Audacity, aria2, CMake, D-Bus, Electron, Elinks, Firefox, Git, Godot, LibreOffice and many others use it.
Expat can be implemented by several programming languages, such as: Python, PHP, Perl and others.
It was written in C and can be intimidating due to the many types of handlers and options you can define. But you only need to learn four functions to do 80% of what you want to do with them:
XML_ParserCreate
- To create a new analyzer object.XML_SetElementHandler
- To define handlers for start and end tags.XML_SetCharacterDataHandler
- To define the handler for text.XML_Parse
- To pass a buffer full of documents to the analyzerFirst of all you need to have it installed in your include. It is in all distro repositories, so just use your distro’s package manager, examples:
To understand, let’s show an example of a code and for more details, I suggest reading the documentation, which is very concise, because the only thing I missed in the documentation was a simple example to understand in general lines.
So, I created this basic, but already functional example. For didactic purposes the code is not object oriented so that you can use it in both C and C++.
For this we are going to use this XML example: vim programmers.xml
And to read this XML we will use this code: vim main.cpp
To compile, run the command:
Now just run ./parser
and notice that the tags are displayed by the function start(void * userData, const char * name, const char * args [])
more precisely by the name
variable.
And the content of the tags by the value(void * userData, const char * val, int len)
more precisely by the variable cpy
. The end function has no content, as it will be used in the XML_SetElementHandler, as stated above.
Try to modify, display in tables and others as an exercise practice, ok ?!