How to get Mime-Type of files with C++

The correct way to avoid data insecurity!


How to get Mime-Type of files with C++


MIME (Multipurpose Internet Mail Extensions) type is a standard used on the internet to indicate the type of content of a file.

Originally developed to identify the types of files attached to emails, the MIME type is now widely used in different contexts, such as on the web, to indicate the type of content of files transmitted via the HTTP protocol.

Each file type is associated with a specific MIME type, which is represented by a string. For example, the MIME type for plain text files is text/plain, while the MIME type for JPEG images is image/jpeg. There are hundreds of standard MIME types that cover a variety of file types, from text documents to audio and video files.

In this article we will see how to identify the mime-type of a file using C++ in both GNU+Linux and as a tip on Windows.


Identifying mime-type in Linux

To achieve this in distribuições GNU+Linux we will use the library libmagic.

It is the library used by the file command, which will include the magic.h header /include/uapi/linux/magic.h) which is linked in a static(static).

To install you can use your system’s package manager, some examples below:

sudo apt install libmagic-dev # Debian, Ubuntu, Mint, ...
sudo pacman -S libmagic # Arch
sudo dnf install file-devel # Fedora
brew install libmagic # macOS

If you don’t find it in your system’s repository, you can compile from scratch:

Remember before you have the compilation tools: gcc, make, in addition to wget to download the tarball.

wget ftp://ftp.astron.com/pub/file/file-5.40.tar.gz
tar -xzf file-5.40.tar.gz
cd file-5.40
./configure
makeup
sudo make install

See here the path where the files are installed

For this example, let’s see the mime-type of this image below which is in PNG format, download it by right-clicking on the image and clicking: Save as in the directory where the code binary will be .

image: cpp-icon.png, for download

cpp-icon.png

Create a main.cpp file and paste the code below:

The code is properly commented explaining each block of code to clarify actions.

// Include libmagic and iostream to write to standard output
#include <iostream>
#include <magic.h>

int main() {
   // Start the cookie
   magic_t magic_cookie;

   // Inform the file we want to see the mime-type
   const char *file_path = "cpp-icon.png";

   // Initialize libmagic
   magic_cookie = magic_open(MAGIC_MIME_TYPE);
   if (magic_cookie == NULL) {
     std::cerr << "Unable to initialize libmagic\n";
     return 1;
   }

   // Load definitions from the mime types database
   if (magic_load(magic_cookie, NULL) != 0) {
     std::cerr << "Unable to load database definitions\n";
     magic_close(magic_cookie);
     return 1;
   }

   // Determines the MIME type of the file
   const char *mime_type = magic_file(magic_cookie, file_path);
   if (mime_type == NULL) {
     std::cerr << "Unable to determine the MIME type of the file\n";
     magic_close(magic_cookie);
     return 1;
   }

   std::cout << "MIME type of file: " << mime_type << "\n";

   // Close libmagic
   magic_close(magic_cookie);

   return 0;
}

Once that’s done, just compile it, pass the -lmagic flag and run:

g++ main.cpp -lmagic
./a.out

The possible and probable output will be:

MIME type of file: image/png

Note that if we change the file extension to any extension, even though the file manager displays an icon referring to the extension, libmagic is safe and correct in this regard, it will show the true mime-type of that file.

This happens a lot in systems where malicious people want to run code on the Web and the page asks to load only the file: jpeg and png, but they only change the extension, but in fact the file is a script.

If I rename the file to .mp4, for example:

mv cpp-icon.png cpp-icon.mp4

And change the code to load cpp-icon.mp4:

const char *file_path = "cpp-icon.mp4";

After compiling and running, you will see that libmagic will display the CORRECT Mime-Type! and not the extension.


Tip to identify Mime-Type on Windows

On Windows, you can use urlmon.h with the code below:

#include <iostream>
#include <urlmon.h>
#include <windows.h>

#pragma comment(lib, "urlmon.lib")

int main() {
     LPCWSTR file_path = L"file_path";
     LPWSTR mime_type = NULL;

     HRESULT hr = FindMimeFromData(NULL, file_path, NULL, 0, NULL, 0, &mime_type,
0); if (SUCCEEDED(hr) && mime_type != NULL) { std::wcout << L"MIME type do
file: " << mime_type << std::endl; CoTaskMemFree(mime_type); } else {
         std::cerr << "Unable to determine the MIME type of the file\n";
     }

     return 0;
}

Remembering that for you to be able to compile, you need to enable urlmon.dll in the Windows Registry, as described in this link.


Beware of fakes!

There are several C++ “libraries” on GitHub that do a FAKE MIME-TYPE, that is, if we do the procedure we described above of renaming the file extension, these “libraries” tell us the incorrect mime-type. This is unsafe and dangerous!

For example, I found this one https://github.com/lasselukkari/MimeTypes, if you clone it:

git clone https://github.com/lasselukkari/MimeTypes

Use this example after renaming the extension:

#include <iostream>
#include "MimeTypes/MimeTypes.h"

int main(){
   std::cout << MimeTypes::getType("./cpp-icon.mp4") << '\n';
   return 0;
}

Compile your file and MimeTypes.cpp from this repository:

g++ main.cpp MimeTypes/MimeTypes.cpp

After running, note that it will incorrectly report the MIME type of the file:

video/mp4

This does not happen with libmagic!


Not all repositories on GitHub do a fake mime-type, but be aware of these cases!


cpp


Share


YouTube channel

Subscribe


Marcos Oliveira

Marcos Oliveira

Software developer
https://github.com/terroo