An **histogram** is a graph or a plot that represents the distribution of the pixel intensities in an image. In this post we're going to focus on the RGB color space (see here if you want an explanation about the difference between some color spaces such as RGB and Lab), hence the intensity of a pixel is in the range `[0, 255]`

. When plotting the histogram we have the pixel intensity in the X-axis and the frequency in the Y-axis. As any other histogram we can decide how many bins to use.

A histogram can be calculated both for the grayscale image and for the colored image. In the first case we have a single channel, hence a single histogram. In the second case we have three channels, hence three histograms.

Calculating the histogram of an image is very useful as it gives an intuition regarding some properties of the image such as the **tonal range**, the **contrast** and the **brightness**.

Let's now see how we can calculate the histogram of any given image using OpenCV and display them using `matplotlib`

.

OpenCV provides the function `cv2.calcHist`

to calculate the histogram of an image. The signature is the following:

```
cv2.calcHist(images, channels, mask, bins, ranges)
```

where:

1. `images`

- is the image we want to calculate the histogram of wrapped as a list, so if our image is in variable `image`

we will pass `[image]`

,

2. `channels`

- is the the index of the channels to consider wrapped as a list (`[0]`

for grayscale images as there's only one channel and `[0]`

, `[1]`

or `[2]`

for color images if we want to consider the channel green, blue or red respectively),

3. `mask`

- is a mask to be applied on the image if we want to consider only a specific region (we're gonna ignore this in this post),

4. `bins`

- is a list containing the number of bins to use for each channel,

5. `ranges`

- is the range of the possibile pixel values which is `[0, 256]`

in case of RGB color space (where `256`

is not inclusive).

The returned value `hist`

is a `numpy.ndarray`

with shape `(n_bins, 1)`

where `hist[i][0]`

is the number of pixels having an intensity value in the range of the *i*-th bin.

We can simplify this interface by wrapping it with a function that in addition to calculate the histogram it also draws it (at the moment we're going to fix the number of bins to `256`

):

```
import cv2
from matplotlib import pyplot as plt
def draw_image_histogram(image, channels, color='k'):
hist = cv2.calcHist([image], channels, None, [256], [0, 256])
plt.plot(hist, color=color)
plt.xlim([0, 256])
```

where `image`

can be loaded using `cv2.imread('path/to/image')`

.

Let's now see the histograms of these three sample images:

Let's start by considering the histogram of the grayscale version of the above sample images. We can write the following helper function to display using `matplotlib`

the histogram of the grayscale version of an image:

```
import cv2
from matplotlib import pyplot as plt
def show_grayscale_histogram(image):
grayscale_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
draw_image_histogram(grayscale_image, [0])
plt.show()
```

If we execute this function for the sample images we obtain the following histograms:

Let's now analyze this plots and see what kind of informations we can extract from them. From the first one we can infer that the all the pixels of the corresponding image have low intensity as their almost all in the `[0, 60]`

range approximately. From the second one we can see that the distribution of the pixel intensities is still more skewed over the darker side as the median value is around `80`

, but the variance is much larger. Then from the last one we can infer that the corresponding image is much lighter overall, but also have some dark regions.

I think that even if the histograms weren't given in the same order of the sample images, anyone could be able to guess which histogram belongs to each image by simply doing the above analysis.

Here are the grayscale images with the corresponding histograms:

Let's now move onto the histograms of the colored sample images. Even in this case we can write the following helper function to display using `matplotlib`

the histogram an image:

```
import cv2
from matplotlib import pyplot as plt
def show_color_histogram(image):
for i, col in enumerate(['b', 'g', 'r']):
draw_image_histogram(image, [i], color=col)
plt.show()
```

If we execute this function for the sample images we obtain the following histograms:

The plots are in the same order of the sample images. As we could have expected from the first plot we can see that all the channels have low intensities corresponding to very dark red, green and blue. We also have to consider that the color black, which is given by `(0, 0, 0)`

in RGB, is abundant in the corresponding image and that may explain why all the channels have peaks in the lower part of the X axis. Anyway this can't be much appreciated in with type of visualization given that we're plotting the three channels independently from each other. Later we will see how we can observe the distribution of the combination of the channels' values by using **multi-dimensional histograms**.

From the second plot we can observe that there's a dark red peak that may correspond to the rocks and the mountains while both the green and the blue channel have a wider range of values. From the last plot, if we exclude the peaks of all the channels in the interval `[0, 30]`

, we can observe the opposite of what we saw in the first plot. All the three channels have high intensities and if we consider that `(255, 255, 255)`

in RGB corresponds to white, then by looking at the image it's clear why the histogram has this distribution.

Here are the sample images with the corresponding histograms:

As anticipated above we're now going to take a look at multi-dimensional histograms.

The idea is that instead of examining each channel separately we analyze them together in groups of 2 (RG, GB, BR) or all together. In the case of 2D histograms the resulting plot will have the pixel intensities of a channel on the X-axis, the pixel intensities of another channel on the Y-axis, and the frequency is given by the color of the plot. Let's look at an example as it is easier to show it than explaining it.

Here's the code that plots each 2D color histograms one for each pair combinations.

```
import cv2
from matplotlib import pyplot as plt
from matplotlib import ticker
def show_image_histogram_2d(image, bins=32, tick_spacing=5):
fig, axes = plt.subplots(1, 3, figsize=(12, 5))
channels_mapping = {0: 'B', 1: 'G', 2: 'R'}
for i, channels in enumerate([[0, 1], [0, 2], [1, 2]]):
hist = cv2.calcHist(
[image], channels, None, [bins] * 2, [0, 256] * 2)
channel_x = channels_mapping[channels[0]]
channel_y = channels_mapping[channels[1]]
ax = axes[i]
ax.set_xlim([0, bins - 1])
ax.set_ylim([0, bins - 1])
ax.set_xlabel(f'Channel {channel_x}')
ax.set_ylabel(f'Channel {channel_y}')
ax.set_title(f'2D Color Histogram for {channel_x} and {channel_y}')
ax.yaxis.set_major_locator(ticker.MultipleLocator(tick_spacing))
ax.xaxis.set_major_locator(ticker.MultipleLocator(tick_spacing))
im = ax.imshow(hist)
fig.colorbar(im, ax=axes.ravel().tolist(), orientation='orizontal')
fig.suptitle(f'2D Color Histograms with {bins} bins', fontsize=16)
plt.show()
```

Let's now take a look on what this functions does. Most of the code is just to have a nicer plot, actually the logic is really simple. On *line 4* we initialize the figure by configuring 3 subplots all in the same row. Each subplot will contain a 2D histogram. On *line 7* we simply define the mapping between the channel index and the corresponding channel name. Keep in mind that the color space of the images loaded using OpenCV is `BGR`

and not `RGB`

, that's why index `0`

correspond to `B`

and not `R`

and so on.

On *line 8* we start to loop over each pair combinations of channels. The first combination is `[0, 1]`

which correspond to the pair BG, the second one is `[0, 2]`

which correspond to the pair BR, the third one is `[1, 2]`

which correspond to the pair GR.

On *line 9* we can see that we call the same function `cv2.calcHist`

that we used for 1D histograms. In order to consider 2D histogram the `channels`

attribute now contains two values instead of a single one, the `bins`

attribute defines how many bins to use for each channel and the value of `ranges`

attribute must be repeated (`[0, 256]`

becomes `[0, 256, 0, 256]`

). The returned value `hist`

is a `numpy.ndarray`

with shape `(bins, bins)`

where `hist[i][j]`

is the number of pixels having an intensity value in the range of the *i*-th bin for the channel in the X-axis and in the range of the *j*-th bin for the channel in the Y-axis.

On *lines 12-24* we simply set the axis limits, tick spacing and the labels to show and on *line 26* we draw the plot. After the for-loop on *lines 28-30* we add the an horizontal colorbar and we finally show the plot.

Here's the generated plot for the sample image with mid tones.

You may have noticed that we used only `32`

bins instead of `256`

as previously. That's because using `256`

bins means having `256 * 256 = 65536`

separate pixel counts which is not very practical and also a waste of resource. Here's the 2D color histogram for the same sample image using `256`

bins.

We can easily extend what we've done for the 2D histogram to calculate 3D histogram. Essentially the call to `cv2.calcHist`

becomes as follows:

```
cv2.calcHist([image], [0, 1, 2], None, [bins] * 3, [0, 256] * 3)
```

Even in this case the returned value `hist`

is a `numpy.ndarray`

with shape `(bins, bins, bins)`

. Unfortunately we cannot visualize this histogram.

The **histogram equalization** process is an image processing method to adjust the contrast of an image by modifying the image's histogram. The intuition behind this process is that histograms with large peaks correspond to images with low contrast where the background and the foreground are both dark or both light. Hence histogram equalization stretches the peak across the whole range of values leading to an improvement in the global contrast of an image.

It is usually applied to grayscale images and it tends to produce unrealistic effects, but it is highly used where a high contrast is needed such as in medical or satellite images.

OpenCV provides the function `cv2.equalizeHist`

to equalize the histogram of an image. The signature is the following:

```
cv2.equalizeHist(image)
```

where:

1. `image`

- is the grayscale image that we want to equalize (we're going to see later how we can extend to colored image).

This function returns the image with the equalized histogram.

If you're not interested in the theoretical part feel free to skip this section and its subsection. We're now going to show which formula is being applied when doing histogram equalization and how to derive it.

Let \(n\) be the number of pixels of the given image and \([p_0, p_k]\) the range of the gray levels. Let's also define the histogram of this image as \(H(p)\) where \(H(p_i)\) is the number of pixels having gray level \(p_i\). Our goal is to have an histogram \(G(q)\), where \(G\) is defined similarly to \(H\), which is uniform over all the levels in the range \([q_0, q_k]\). Hence we need to find a monotonic transformation function \(q = \mathcal{T}(p)\).

Given that the total sum of bins doesn't change and thanks to the monotonicity we can imply:

\[ \sum_{i=0}^k G(q_i) = \sum_{i=0}^k H(p_i) \]

Additionally given that \(G\) is uniform, we have:

\[
G(q) = \frac{n}{q_k - q_0}

\]

Since the exactly uniform histogram can be obtained only in the continuous space, let's change the first formula as follows:

\[ \int_{q_0}^q G(s) ds = \int_{p_0}^p H(s) ds \]

We can now proceed in obtaining the function \(\mathcal{T}\).

\begin{align}
\int_{q_0}^q G(s) ds &= \int_{p_0}^p H(s) ds \\
\int_{q_0}^q \frac{n}{q_k - q_0} ds &= \int_{p_0}^p H(s) ds \\
\frac{n}{q_k - q_0} \int_{q_0}^q ds &= \int_{p_0}^p H(s) ds \\
\frac{n (q - q_0)}{q_k - q_0} &= \int_{p_0}^p H(s) ds \\
q &= \frac{q_k - q_0}{n}\int_{p_0}^p H(s) ds + q_0 \\

\end{align}

We have found our function \(\mathcal{T}\):

\[
q = \mathcal{T}(p) = \frac{q_k - q_0}{n}\int_{p_0}^p H(s) ds + q_0

\]

In the discrete space this corresponds to:

\[
q = \mathcal{T}(p) = \frac{q_k - q_0}{n}\sum_{i = p_0}^p H(i) + q_0

\]

This may seem a complicated formula, but essentially the function \(\mathcal{T}\) is simply a lookup table: if in the original histogram the bin count at level \(p\) is equal to \(v\), then in the equalized histogram that bin is "moved" at level \(q\). This mechanism is applied to all levels \(p\).

Let's see how the histogram equalization works by doing it "by hand" on a very simple fake image. Let our sample grayscale image be defined as follows:

\begin{bmatrix}
0 & 32 & 96 & 64 \\

32 & 32 & 32 & 32 \\

0 & 128 & 0 & 32 \\

128 & 128 & 255 & 64 \\

\end{bmatrix}

We want to equalize the image so that the whole range `[0, 255]`

is used. Hence we have \(q_0 = 0\), \(q_k = 255\), \(p_0 = 0\) and \(n = 16\). We can then calculate the following table:

\(p\) | \(H(p)\) | \(\sum_{i=0}^{p} H(i)\) | \(q\) |
---|---|---|---|

0 | 3 | 3 | 48 |

32 | 6 | 9 | 143 |

64 | 2 | 11 | 175 |

96 | 1 | 12 | 191 |

128 | 3 | 15 | 239 |

255 | 1 | 16 | 255 |

where \(q\) is simply:

\[
q = \frac{255}{16}\sum_{i = 0}^p H(i)

\]

We can now plot both the original and the equalized histogram. We can notice that cumulative equalized histogram is very close to the optimal one.

Let's now see how we can easily equalize a grayscale image and show it. Here's the code:

```
import cv2
from matplotlib import pyplot as plt
def show_grayscale_equalized(image):
grayscale_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
eq_grayscale_image = cv2.equalizeHist(grayscale_image)
plt.imshow(eq_grayscale_image, cmap='gray')
plt.show()
```

And here's the comparison between the original grayscale sample image and the equalized one:

As we can see the global contrast has been improved by the equalization process and the histogram has been stretched.

It's possible to extend the histogram equalization and apply it to colored images. The most naive approach consits in applying the same process to all the three `RGB`

channels separately and rejoining them together. The problem is that this process changes the the relative distributions of the color and may consequently yield to dramatic changes in the image's color balance. Here's the code:

```
import cv2
from matplotlib import pyplot as plt
def show_rgb_equalized(image):
channels = cv2.split(image)
eq_channels = []
for ch, color in zip(channels, ['B', 'G', 'R']):
eq_channels.append(cv2.equalizeHist(ch))
eq_image = cv2.merge(eq_channels)
eq_image = cv2.cvtColor(eq_image, cv2.COLOR_BGR2RGB)
plt.imshow(eq_image)
plt.show()
```

And here's the result of applying the algorithm to a colored sample image:

An alternative is to first convert the image to the `HSL`

or `HSV`

color space and then apply the histogram equalization only on the lightness or value channel by leaving the hue and the saturation of the image unchanged. Here's the code that applies the histogram equalization on the value channel of the `HSV`

color space:

```
import cv2
from matplotlib import pyplot as plt
def show_hsv_equalized(image):
H, S, V = cv2.split(cv2.cvtColor(image, cv2.COLOR_BGR2HSV))
eq_V = cv2.equalizeHist(V)
eq_image = cv2.cvtColor(cv2.merge([H, S, eq_V]), cv2.COLOR_HSV2RGB)
plt.imshow(eq_image)
plt.show()
```

And here's the result of applying the algorithm on the same sample image:

In this post we saw how to extract and visualize the histogram of an image, both grayscale and colored, and what kind of intuitions we can infer from that. We also discussed how histogram equalization works and how it can be extended to colored images.

There are also other algorithms for histogram equalization that are more robust that haven't been considered here such as AHE (**Adaptive Histogram Equalization**) and CLAHE (**Contrast Limited Adaptive Histogram Equalization**).

Image histograms are simple, but largely used in image processing. One interesting application is the usage of image histograms to build an image search engine based on the similarity between them such explained in this blog post.

]]>**Color quantization** is the process of reducing the number of distinct colors used in an image. The main reason we may want to perform this kind of compression is to enable the rendering of an image in devices supporting only a limited number of colors (usually due to memory limitations).

Obviously all compressions come with a cost. In this case the resulting image may differ too much from the original one. Hence the goal of the color quantization is to obtain a compressed image *as similar as possibile* to the original one. The key factor for achieving this is the selection of the **color palette** by choosing the colors that most *summarizes* the original image.

The most common techniques reduce the problem of color quantization into a **clustering problem** of points where each point represents the color of a pixel. It consists in creating the palette by selecting a representative point for each cluster. After that the compression simply remaps all the colors into their cluster representative. As you may guess the resulting palette highly depends on the color space and distance metric used.

Before digging into the palette selection in more detail here's a simple and brief introduction about color spaces and color difference. The idea is to give a grasp on some concepts that are necessary to understand what comes next without being too much detailed as a more detailed explanation is out of the scope of this post. Feel free to skip these parts if you already know what they're talking about.

As previously anticipated a color can be represented as a point in an *n*-dimensional space called **color space**. Most commonly the space is 3-dimensional and the coordinates in that space can be used to encode a color.

There are many color spaces for different purposes and with different *gamut* (range of colors), and in each of them it is possibile to define a distance metric that quantifies the **color difference**. The most common and easiest distance metric used is the *Euclidean distance* which is used in RGB and Lab color spaces.

The **RGB** (abbreviation of red-green-blue) color space is by far the most common and used color space. The idea is that it is possibile to create colors by combining red, green and blue. A color in RGB is usually encoded as a 3-tuple of 8 bits each, hence each dimension takes a value within the range `[0, 255]`

where `0`

stands for absence of color while `255`

stands for *full presence* of color.

As previously said in RGB we can use the Euclidean distance to calculate the difference between two colors:

\[
d = \sqrt{(R_1 - R_2)^2 + (G_1 - G_2)^2 + (B_1 - B_2)^2}

\]

One caveat of using the Euclidean distance in the RGB color space is that it isn't uniform with the difference perceived by the eye. In other words two pairs of colors might be perceived not equally different even if they have the same Euclidean distance. If we denote with \(d'\) the perceived difference and we calculate the difference between the pairs of colors \((R_1, G_1, B_1)\), \((R_2, G_2, B_2)\) and \((R_1, G_1, B_1)\), \((R_3, G_3, B_3)\) we might have:

\[
d((R_1, G_1, B_1), (R_2, G_2, B_2)) = d((R_1, G_1, B_1), (R_3, G_3, B_3)) \\

d'((R_1, G_1, B_1), (R_2, G_2, B_2)) \neq d'((R_1, G_1, B_1), (R_3, G_3, B_3))

\]

This is due to the fact the the human eye is more sensitive to certain colors than others.

The **Lab** color space, more formally *L ^{*}a^{*}b^{*}*, includes all perceivable colors meaning that its gamut is a superset of the gamut of the RGB color space, and has the intent of being perceptually uniform and hence to solve the problem that affects the RGB color space when using the Euclidean distance. In the case of Lab this distance is known as CIE76 and it is denoted with \(\Delta E^*_{76}\) (CIE stands for

\[ \Delta E^*_{76} = \sqrt{(L^*_1 - L^*_2)^2 + (a^*_1 - a^*_2)^2 + (b^*_1 - b^*_2)^2} \]

The *L ^{*}* dimension represent the lightness of the color and its value is in

`[0, 100]`

where the higher the lighter. `0`

represent a neutral gray and their range depends on the Lab implementation, but it is usally `[-100, 100]`

or `[-128, 127]`

.There are many other color spaces and even if they're not going to be used in the next sections to perform color quantization, it worth mentioning the *L ^{*}C^{*}h^{*}* color space that has been introduced in order to fix some problems that occur when using \(\Delta E^*_{76}\). For this reason the CIE defined the \(\Delta E^*_{94}\) and then the \(\Delta E^*_{00}\) both defined in the

We can now start performing color quantization and see how the output changes depending on the color space used, the method adopted to create the palette, and the size of the palette we want to build.

To compare the different cases we're going to use the image of Lenna:

The easiest method to select the colors that will be part of the palette is to choose them completely random. After having built the palette each color is remapped to the nearest color that is part of that selection. As you may guess using RGB or Lab space doesn't make any difference and the only thing that affects the final output is the number of colors to use. If with this selection strategy it seems that one color space seems to behave better than the other one, keep in mind that given the randomness of the selection it can be just a case.

In the following figure it's visually explained the whole process in case of RGB space where the final palette is composed by three colors (for Lab space is completely analogous). For simplicity in this example the blue channel is fixed to the value of `0`

, hence we can consider only the `R`

and the `G`

dimensions. All the pixels of the image are represented with the RGB encoding and then mapped into the RGB space. Then 3 random colors are selected to be part of the palette. Finally all the colors are remapped to the nearest selected color according to the Euclidean distance.

Here there are the output images generated by randomly selecting the colors in RGB color space. Their palettes are composed by `8`

, `16`

, `32`

and `64`

colors respectively.

Here instead there are the output images generated by randomly selecting the colors in Lab color space. Also in this case their palettes are composed by `8`

, `16`

, `32`

and `64`

colors respectively.

The implementation of the color quantization via random palette selection is very easy. In the following snippet both the input variable `raster`

and the output variable `quantized_raster`

are `numpy.ndarray`

s which correspond to the raster of the original image and the raster of the quantized image. They can be both be encoded with RGB or Lab (the output raster has the same encoding of the input) and have shape `(width, height, 3)`

.

```
import numpy as np
from sklearn.metrics import pairwise_distances_argmin
from sklearn.utils import shuffle
def quantize(raster, n_colors):
width, height, depth = raster.shape
reshaped_raster = np.reshape(
raster, (width * height, depth))
palette = shuffle(reshaped_raster)[:n_colors]
labels = pairwise_distances_argmin(
reshaped_raster, palette)
quantized_raster = np.reshape(
palette[labels], (width, height, palette.shape[1]))
return quantized_raster
```

The raster of an image can be retrieved by passing the path of the file to `scipy`

:

```
import scipy.misc
raster = scipy.misc.imread(image_filename)
```

The output of `scipy.misc.imread`

is a `numpy.ndarray`

with shape `(width, height, 3)`

and values RGB encoded. Hence in order to perform the quantization in Lab space we first need to convert RGB to Lab. Thanks to `skimage`

it is possibile to convert from RGB to Lab and viceversa:

```
from skimage import color
lab_raster = color.rgb2lab(rgb_raster)
rgb_raster = color.lab2rgb(lab_raster) * 255).astype('uint8')
```

Finally to visualize the result of the color quantization we can use `matplotlib`

:

```
import matplotlib.pyplot as plt
plt.imshow(rgb_raster / 255.0)
plt.draw()
plt.show()
```

Note that the `imshow`

function works only with RGB values normalized in the `[0, 1]`

interval, that's why we need to divide by 255.

As anticipated at the beginning of the post one common approach to perform color quantization is to use a clustering algorithm. In this case we're going to use **k-means** that minimizes the within-cluster sum of squared distances between the centroid and the other points of the cluster. Given that the distance used by the k-means clustering algorithm is the Euclidean distance, it is a natural fit for being applied for color quantization with both RGB and Lab space. For the purpose of this post I'm going to assume that you know how k-means works.

The procedure consists in applying the k-means algorithm with a number of centroids that is equal to the number of colors we want the palette to be composed of. Then we use the resulting centroids as the colors that will be part of the palette. After that, each color is remapped to the nearest one among those in the palette as in the random selection. Given the nature of k-means, the nearest color corresponds exactly to the centroid representing the cluster the color is part of.

It is important to notice that `n`

pixels of an image having the same color consist in `n`

overlapping points in the color space and not in a single point. Actually in case of random selection it makes no difference, but it does a lot of difference when applying the k-means algorithm.

Remember that the Euclidean distance in Lab space is uniform with the difference perceived by the eye. Hence with this kind of selection we can expect that using the Lab space should have a visually better result than RGB.

Similarly to the other figure in the following one it's visually explained the whole process in case of RGB space, but using the k-means algorithm. Even in this example the final palette is composed by three colors and the blue channel is fixed to the value of `0`

for simplicity. All the pixels of the image are represented with the RGB encoding and then mapped into the RGB space as in the case of the random selection. Then the k-means algorithm is applied and the centroids of the cluster are selected to be part of the palette. Finally, as in the random palette selection, all the colors are remapped to the nearest selected color according to the Euclidean distance.

Here there are the output images generated using k-means in RGB color space. Their palettes are composed by `8`

, `16`

, `32`

and `64`

colors respectively.

Here instead there are the output images generated using k-means in Lab color space. Also in this case their palettes are composed by `8`

, `16`

, `32`

and `64`

colors respectively.

Here's the code to perform the color quantization through k-means. The following function has the same signature of the one used for random selection, and what have been already discussed regarding raster loading, color space conversion, and result visualization still applies.

```
import numpy as np
from sklearn import cluster
def quantize(raster, n_colors):
width, height, depth = raster.shape
reshaped_raster = np.reshape(raster, (width * height, depth))
model = cluster.KMeans(n_clusters=n_colors)
labels = model.fit_predict(reshaped_raster)
palette = model.cluster_centers_
quantized_raster = np.reshape(
palette[labels], (width, height, palette.shape[1]))
return quantized_raster
```

We can visibly notice that the usage of k-means outperforms the usage of simple random selection. Infact the images obtained with random selection with `64`

colors are very similar to those obtained with k-means with `32`

colors independently from the color space used. The drawback of using k-means is that it is obviously much slower especially if the training is performed with all the pixels of the original image. This can be mitigated by using only a random sample of the pixels.

On the other hand at first glance the results of k-means in RGB space and k-means in Lab space are similar. Anyway by taking a closer look at each couple of images, in some cases it is possible to notice some details that make us say that one is better than the other. In general for this particular image the face seems to be clearer and more defined when using RGB and this can be noticed especially in the first two couple of images. However using Lab space seems to render the reflection in the mirror and the hat better, and it also recalls the brightness of the original image. Overall I wouldn't say that when using k-means a color space is a better choice than the other, at least not for this particular image.

]]>