The other day I saw a simple but interesting question on the internet. Someone posted wanting to know: “How to remove HTML tags in C?” .
It quickly came to my mind RegEx, but with C++ .
If you understand Regular Expressions with C++ it is really very easy, just:
<regex>
header;regex_replace()
function to replace with the string you want.In summary the code is this:
Probable output:
This is a link
But in Linguagem C things are really not that easy.
You can use regex.h
in C, but it will only check for patterns, but the replacement will be up to you.
For example, checking if a given string has tags in it, we can use it like this:
Likely output:
Has tags!
For more information access the POSIX page of the manual by the command:
After you check if a given string has tags (saves processing) the next step is to remove the tags.
I came up with a solution of my own (and simple 💡 ) that may be contested by C lovers, but it works 😎 . The code itself is:
stdio.h
to use printf
;string.h
to use strlen
;stdbool.h
to use the bool
typeSIZE
constant to optimize performancechar *
return function for redefining. And that function is as follows:
for
loop to go through the string according to the number of characters in it;<
tag was identified in the string;tag
as true
out[SIZE];
false
only after identifying the >
closing tag character.The final code is:
Probable output:
This is a link
The right thing would be to allocate space on the heap, because a string that contains a document HTML can be huge. But for didactic purposes, and to understand the logic, it’s a good size.