Amir Khalilian

Masked Non-negative Matrix Factorization

  1. Introduction
  2. The non-negative matrix factorization problem
  3. Masked non-negative matrix factorization
  4. Example: Hubble images with missing pixels
  5. A note on implementation in Matlab
  6. References


Non-negative matrix factorization is a powerful tool for dimensionality reduction and data analysis. The goal in matrix factorization, in general, is to summarize a given data matrix with two low-rank factors. In this non-negative matrix factorization (NMF), we require the factor matrices to have only non-negative elements. This is in contrast to some other factorization methods, such as principle component analysis (PCA). In this brief article, we are interested in scenarios where not all the elements in the given data matrix are measured or valid. In this case, we want to "skip" these missing elements when solving the optimization problem and find the factor matrices without considering the error on those elements.

The non-negative matrix factorization problem

Given the non-negative data matrix DR0m×nD\in\mathbb{R}^{m\times n}_{\geq 0} we want to find matrices WR0m×rW\in\mathbb{R}^{m\times r}_{\geq 0} and HR0r×nH\in\mathbb{R}^{r\times n}_{\geq 0} such that DWHD\approx WH. The optimization problem can be written as

W,H=argminW,HDWHF2s.t.WRm×r,W0 and HRr×n,H0. W^{\ast}, H^{\ast} = \arg\min_{W,H} \|D-WH\|_{F}^2 \quad \text{s.t.} \quad W\in\mathbb{R}^{m\times r},\quad W\geq 0 \text{ and } H \in \mathbb{R}^{r\times n},\quad H\geq 0.

An iterative algorithm

The following algorithm, called Lee and Seung's multiplicative update rule, solves the NMF problem.

H[i,j]n+1H[i,j]n((Wn)TV)[i,j]((Wn)TWnHn)[i,j]{\displaystyle \mathbf {H} _{[i,j]}^{n+1}\leftarrow \mathbf {H} _{[i,j]}^{n}{\frac {((\mathbf {W} ^{n})^{T}\mathbf {V} )_{[i,j]}}{((\mathbf {W} ^{n})^{T}\mathbf {W} ^{n}\mathbf {H} ^{n})_{[i,j]}}}} W[i,j]n+1W[i,j]n(V(Hn+1)T)[i,j](WnHn+1(Hn+1)T)[i,j].{\displaystyle \mathbf {W} _{[i,j]}^{n+1}\leftarrow \mathbf {W} _{[i,j]}^{n}{\frac {(\mathbf {V} (\mathbf {H} ^{n+1})^{T})_{[i,j]}}{(\mathbf {W} ^{n}\mathbf {H} ^{n+1}(\mathbf {H} ^{n+1})^{T})_{[i,j]}}}}.

One drawback of this algorithm is that when an entry of WW or HH is initialized as zero or positive, it remains zero or positive throughout the iterations. Therefore, all entries should be initialized to be positive and sparsity constraints are not simply added to the algorithm unless via hard-thresholding.

Alternating non-negative least squares algorithm

Another algorithmic solution can be achieved via alternating non-negative least squares (ANLS). Assume we have WW fixed then we get the following optimization for HH:

argminHDWHF2s.t.HRr×n,H0,\arg\min_{H} \|D-WH\|_{F}^2 \quad \text{s.t.} \quad H \in \mathbb{R}^{r\times n},\quad H\geq 0,

which is the least squares problem for HH with a non-negativity constraint. The solution to this problem can be determined by solving WWH=WDW^\top W H=W^\top D for HH followed by a projection onto R0\mathbb{R}_{\geq 0}. We can also use the method proposed in this paper to solve this problem. The code can be found here. Now for fixed HH we can write:

argminWDHWF2s.t.WRm×r,W0,\arg\min_{W} \|D^\top-H^\top W^\top\|_{F}^2 \quad \text{s.t.} \quad W \in \mathbb{R}^{m\times r},\quad W\geq 0,

This can be minimized by solving HHW=HDH H^\top W^\top = H D^\top for WW, followed by the projection onto R0\mathbb{R}_{\geq 0}. It is also recommended that we normalize the columns of WW to unit 2\ell_2-norm and scale the rows of HH accordingly after each iteration. Using changes in DWHF\|D-WH\|_F per iteration, we can define the stopping criteria.

Masked non-negative matrix factorization

In real application, we might have a case where the columns of DD represent different data-points and each row represents a separate features. In this case, we might have a situation where some of the features are missing from the dataset. In the NMF formulation, we do not want to consider the error over these missing values. In other words, we only wanna consider the error on the elements in DD that are valid or measured and discard the error on the rest. It is fair to assume that we know which elements are missing and use this information when formulating the problem.

Let M{0,1}m×nM\in\{0,1\}^{m\times n} be the masking matrix such that Mij=0M_{ij}=0 iff DijD_{ij} is missing, otherwise Mij=1M_{ij}=1. In this case we only want to consider the error on measured signals. So we get,

W,H=argminW,HM(DWH)F2s.t.WRm×r,W0 and HRr×n,H0, W^{\ast}, H^{\ast} = \arg\min_{W,H} \|M\circ(D-WH)\|_{F}^2 \quad \text{s.t.} \quad W\in\mathbb{R}^{m\times r},\quad W\geq 0 \text{ and } H \in \mathbb{R}^{r\times n},\quad H\geq 0,

where \circ denotes the element-wise multiplication. The loss function in this case only considers the error in elements of DD where MM is non-zero.

How do we add the Masking to an iterative solver?

We can apply the ANLS algorithm to the masked problem. When we fix WW we get the following optimization over HH,

argminHM(DWH)F2s.t.HRr×n,H0.\arg\min_{H} \|M\circ(D-WH)\|_{F}^2 \quad \text{s.t.} \quad H \in \mathbb{R}^{r\times n},\quad H\geq 0.

Following the ANLS method we have to solve WM(WH)=W(MD)W^\top M\circ(W H)=W^\top (M\circ D) for HH followed by a projection onto R0\mathbb{R}_{\geq 0}. The derivation here shows the solution without the non-negative constraint. Let M=[m1, ,mn]M=[m_1,\cdots,m_n] where mjm_j is the jj-th column of MM. Then, these equations can be solved for each column of HH separately by solving the linear system Wdiag(mj)W hj=Wdiag(mj) djW^\top \mathrm{diag}(m_j) W\,h_j=W^\top \mathrm{diag}(m_j)\,d_j for hjh_j. We note that this can still be solved by nnlsm_blockpivot.m method from this paper. When an element in mjm_j is zero, we remove the corresponding row from the linear system (technically there is no need to do this and we can directly form Wdiag(mj)WW^\top \mathrm{diag}(m_j) W and Wdiag(mj) djW^\top \mathrm{diag}(m_j)\,d_j and solve the linear system, however, by removing the zeros we make sure that Wdiag(mj)WW^\top \mathrm{diag}(m_j) W does not become unnecessarily ill-conditioned).

Now for fixed HH we can write:

argminWM(DHW)F2s.t.WRm×r,W0,\arg\min_{W} \|M^\top\circ(D^\top-H^\top W^\top)\|_{F}^2 \quad \text{s.t.} \quad W \in \mathbb{R}^{m\times r},\quad W\geq 0,

which is solved by the system of equations H M(HW)=HDH\,M^\top \circ (H^\top W^\top) = H D^\top for WW followed by a projection onto R0\mathbb{R}_{\geq 0}. Similarly, this can be diagonalized for rows of the matrix WW and solved via the same method.

Example: Hubble images with missing pixels

The Hubble images dataset consists of 100 images of the Hubble telescope where each image highlights different materials. For example, five samples of these images are

Each image is converted into a column vector and is stored in the our data matrix D. In this case, D is a matrix of size 16384 x 100 (note that each image is 128x128). To simulate the missing pixels, we compute the masking matrix by randomly choosing 60% of the pixels.

D = ... % load from Hubble.mat
% create a random mask -- keep p% of enteries
p = 0.6;
M = rand(size(D))<=p;
% data with missing enteries
MD = M.*D;

The sample images when we set the missing pixels to zero are

If we just call the nnmf function in Matlab with the matrix MD we get the NMF solution without considering the mask. Lets assume we consider rank to be eight!

% set a rank
r = 8;
% run nnmf without masking
[W_nmf,H_nmf] = nnmf(MD, r);
% recovered estimated data
D_hat_nmf = W_nmf*H_nmf;

The recovered sample images are

Now lets consider the solution via masked-nnmf. In this case, we need to provide the masking matrix M to the solver as well as the desired rank r.

% run the solver
[W,H] = masked_nnmf(MD, M, r,...
                    'init_mode', 'rand',...
                    'maxiter', 250);
% recovered estimated data
D_hat = W*H;

The recovered sample images in this case are

As we can see, by ignoring the missing data, we can exploit the low-rank structure of the original data matrix and recover the missing pixels. We can further look at the factors that are found by each algorithm. Note that the W matrices are of size 16384 x 8 and can be viewed as eight different factor images for each method. The factor images from nnmf and masked-nnmf are

We see that the method considering the masking better finds the underlying low-rank structure.

A note on implementation in Matlab

I implemented this algorithm in Matlab and you can find the code here. The core solver is the masked-nnmf function. Here is a brief overview of the main iteration.

The solver function receives three main inputs:

The main loop consists of an update over HH followed by an update over WW. Given the matrix M we can precompute the rows in WjW_j and djd_j that we need to use when solving for hjh_j. Similarly, we find columns to use in each row of did_i and HiH_i when solving for rows wiw_i. We write the update iteration as:

% update H
WT = W';
parfor (j = 1:size(D,2), num_cores) % loop over columns of D
	WT_mj_dj = transpose(sum(W(rows2use{j},:).*D(rows2use{j},j),1));
	WT_mj_W = WT(:,rows2use{j}) * W(rows2use{j},:);
	H(:,j) = nnlsm_blockpivot(WT_mj_W, WT_mj_dj, true);

% update W
HT = H';
parfor (i = 1:size(D,1), num_cores) % loop over rows of D
	H_mi_di = sum(H(:,cols2use{i}).*D(i,cols2use{i}),2);
	H_mi_H = H(:,cols2use{i}) * HT(cols2use{i},:);
	W(i,:) = nnlsm_blockpivot(H_mi_H, H_mi_di, true);