An introduction to GlusterFS

In GlusterFS web page, they describe their product simply like this: “Gluster is a free and open source software scalable network filesystem”. It is interesting since it is designed to scale by providing new resources to the Gluster Cluster.

There is a nice description about what is GlusterFS in their own web page. There, you can read:

  • Gluster is an easy way to provision your own storage backend NAS using almost any hardware you choose.
  • It’s not just a toy. Gluster is enterprise-ready, and commercial support is available if you need it. It is used in some of the most taxing environments like media serving, natural resource exploration, medical imaging, and even as a filesystem for Big Data.

However, they also warn, among other things that “Gluster does not support so called ‘structured data’, meaning live, SQL databases. Of course, using Gluster to backup and restore the database would be fine”. Using Gluster to store data in your database might lead to delays and sharing a volume with different database servers might lead to corruption.

GlusterFS in the Servers

Following with my classical Ubuntu 20.04 installations, I’ll describe how to install GlusterFS in 2 or more “server” nodes (it doesn’t work only with one node). Let’s imagine we have 3 nodes to install glusterfs-server and the name of these server nodes are gluster1, gluster2 and gluster3. So, in the 3 nodes we should run:

sudo apt install software-properties-common
sudo apt-add-repository  ppa:gluster/glusterfs-7
sudo apt update
sudo apt install glusterfs-server

# Enable and start the glusterd service:
sudo systemctl enable glusterd
sudo systemctl start glusterd

Great, after running those 4 commands we have GlusterFS installed in 3 servers. We need to “connect” them to start working together. So, in only one node (let’s say gluster1 and assuming that every node can access by name to every other node) we can simply run the commands:

sudo gluster peer probe gluster2
sudo gluster peer probe gluster3

We can test that the peers are connected with command:

gluster peer status
gluster peer status output

Once the nodes are connected, we’ll be able to create a new Volume to be shared using GlusterFS. For testing purposes we are going to create the Volumes in the root partition which is not recommended. The recommendation is using another different partition to create the volumes, so, sorry for the “trick”. Anyway, for learning and testing could be enough.

First, we need to create the directory containing the volume in the 3 gluster nodes (if we want 3 replicas, we could do it with only 2 replicas and the directory would only be needed in 2 nodes). So, run this command in the 3 nodes:

mkdir /storage

After the directory exists in the 3 nodes, we can create the gluster volume with “gluster volume create” (force parameter is there to force using the root partition) and we can start the synchronization in the nodes using “gluster volume start”. In this example the name of the volume is “mongodb”:

sudo gluster volume create mongodb replica 3 transport tcp gluster1:/storage/mongodb gluster2:/storage/mongodb gluster3:/storage/mongodb force
sudo gluster volume start mongodb

In order to know the list of volumes or their status, we can check with the commands:

sudo gluster volume list
sudo gluster volume status [volume_name]
gluster volume list and gluster volume status

Caveat: I’ll follow the approach for the demo to install a MongoDB using docker having the data in GlusterFS.

Two other interesting commands are meant to stop the volume and to delete the volume. The volume won’t be deleted if is started, so we need to stop it before deleting it:

sudo gluster volume stop xxxx
sudo gluster volume delete xxxx

GlusterFS in the Client

We can mount the “shared” GlusterFS directories in the clients where we have the Gluster client installed. In order to install the Gluster Client:

sudo apt install software-properties-common
sudo apt-add-repository  ppa:gluster/glusterfs-7
sudo apt update
sudo apt install glusterfs-client

Great! We have our client now running. We are going to install a MongoDB using Docker with the data in our Gluster cluster. First, we can create a mount point directory in our Gluster client and mount the Volume:

sudo mkdir /mongodb
sudo mount -t glusterfs gluster1:/mongodb /mongodb

And finally, we only need to run the Mongodb docker:

sudo docker run -v /mongodb:/data/db -p 27017:27017 --name mongodb -d mongo

The docker is run (I hope you have Docker installed) and the files are synchronized in the 3 Gluster nodes according to the way we created the Volume:

Something interesting to do is configuring the volume to be automatically mounted when the Gluster Client starts, so, a line like this one could be added to fstab:

gluster1:/mongodb   /mongodb   glusterfs defaults,_netdev,noauto,x-systemd.automount 0 0

Using LXC to Deploy the GlusterFS Cluster

As I’ve done some other times, I’ve coded a couple of Ansible scripts to deploy the GlusterFS cluster using LXC. These scripts can be found in my Github.

In order to get “Ansible” + LXC working in your laptop, you can follow the instructions I gave in my article about deploying Kubernetes on LXC. Under the title: “Prepare my server (laptop)”.

LXC Containers in Ubuntu 20.04

LXC, as you can read in https://en.wikipedia.org/wiki/LXC, is an operating-system-level virtualization method for running multiple isolated Linux systems (containers) on a control host using a single Linux kernel.

Basically this means that you can have multiple containers behaving as if they were Virtual Machines. This is a different approach to Dockers, which are meant to run a single application inside. For instance, if we wanted to run a WordPress using Docker, we should run a docker for the database and another one for the HTTP server and both Docker should communicate using some networking capabilities. However, if we run the same instance using LXC, we could install the Database and the HTTP server using the same container (a kind of lightweight Virtual Machine). So, I think this is basically the biggest difference: A Docker is thought to run an application and LXC is thought to behave in a close way to a virtual machine.

Installation and first Steps

In order to install LXC we could simply run:

apt install lxc lxc-templates lxc-utils

Now, we can start working with LXC. Of course, the first thing we’d like to do is creating our first LXC Virtual Machine, in order to do this, we could type the following command:

lxc-create -t ubuntu -n u1 -- -r focal

And after a while, we’ll have our container. It won’t be runing, but it will be created. The name of my container, in this case, is n1 In order to run the container we should type:

lxc-start -d -n u1

Once we are running our container, we can see it using the command “lxc-ls –fancy“:

As you can see, when the container is STOPPED, it has no IP. After the container is started, it is assigned an IP address and will use that IP address for the life of the container (until it is destroyed). By default, a bridge is created in LXC host named lxcbr0 which will act as “gateway” for the containers.

NOTE: The default network is 10.0.3.0/24 instead of 10.0.4.0/24 in the example. It doesn’t matter much at this point.

Two other interesting commands to work with containers are: lxc-stop to stop a container and lxc-destroy to definitely remove the container (forever):

# Stop a running container
lxc-stop -n u1

# Remove a stopped container
lxc-destroy -n u1

Configuration of Containers

Once the container is created, a new directory is created in /var/lib/lxc with the name of our container. In our example, the directory will be /var/lib/lxc/u1.

There is a config file in that directory named config. We can change the configuration of the container here. There are plenty of different configurations which can be done here.

The default configuration is enough to run most applications. However, there are some things that might require to change this configuration. For example: Running nested LXC Containers or Dockers, running Virtual Machines using KVM…

If we wanted to run these kind of applications, we must add the following lines to the end of the config file:

lxc.apparmor.profile = unconfined
lxc.cgroup.devices.allow = a
lxc.cap.drop =

In my example, I’m running LXC in LXC. So this is the reason the IPs of my examples are 10.0.4.0/24 instead of 10.0.3.0/24.

Running Docker inside LXC

I run a new container (this time it won’t be nested):

lxc-create -n docker-in-lxc -t ubuntu

And I add the lines shown above to the end of its config file: /var/lib/lxc/docker-in-lxc/config and I start the container with lxc-start.

I log in the container and I run the command

sudo apt install docker docker.io

Once everything is installed, we can test if it works:

KVM In LXC

Again, we create another lxc container changing its config file as explained above and we can ssh to that container.

sudo apt install libvirt-clients libvirt-daemon libvirt-daemon-system libvirt-daemon-system qemu-kvm qemu-utils qemu-system-x86

Once it gets installed, our libvirt will be there.

Before we can do anything with this, we’ll need to create a device driver which is not present by default in our LXC. The /dev/net/tun which is a must in order to run our Virtual machines with networking:

We also need to be aware that Kernel Modules are not loaded inside the LXC containers. Modules are loaded in the Kernel and the Kernel is shared for every LXC container. So, if we want virtualization to work, we’ll need the KVM Module loaded in the real Linux Kernel: We need kvm installed in our real operating system.

It is not mandatory to have nested virtualization configured in our module. But anyway, I have it.

In order to test my “brand-new” kmv-in-lxc, I’ve downloaded a small linux image (Cirros): http://download.cirros-cloud.net/0.5.1/cirros-0.5.1-x86_64-disk.img

Once the image is downloaded, I will start a new VM using that image. However, it is a cloud based image which will try to connect to 169.254.169.254 in order to get cloud metadata information. As I don’t have any cloud provider such as openstack, booting a Virtual Machine from this image will take a long, long time. In order to make all this to fail faster, I’m going to add the ip 169.254.169.254 to device eth0 inside the container:

sudo ip addr add 169.254.169.254/32 dev eth0

And now, I boot a VM from my image:

sudo kvm  -no-reboot -nographic  -m 2048 -hda cirros-0.5.1-x86_64-disk.img

This VM can be destroyed with “sudo poweroff“.