What I do

I'm doing research in Machine Learning. More specifically, I am interested in algorithms for learning hierarchical representations of data. So-called "deep architectures" are a class of such models that we, at the LISA lab of Yoshua Bengio, are interested in.

Why are we interested in deep architectures?

We get inspiration from the multiple layers of processing that the human brain does.
There are some interesting theoretical results that show that deep architectures can potentially be much better at representing data, compared to shallow architectures.
There exist several efficient ways to train fully-connected multi-layer neural networks.
The latter outperform the state-of-the-art in many vision and NLP tasks.

I've compiled an annotated reading list on deep architectures. It is not meant to be comprehensive, but I tried to include most of the work that is relevant to the subject.

My work, as revealed by my publications (especially the more recent ones), can be summarized as follows: I am trying to "poke" a variety of deep architectures in many different ways in order to understand how and why they work. I have been especially interested in understanding the effect of unsupervised pre-training. In this sense, we have advanced several hypotheses related to its regularization and optimization effects (AISTATS'09 and work in progress). We have also advanced a hypothesis that pre-training can be harmful in certain scenarios and that there is a need for more semi-supervised (pre-)training (ICML'07)

I'm also interested in better understanding the solution learned by a deep network: to this end, I've been looking at ways to "visualize" an arbitrary unit of a deep network. Our latest tech report analyzes in depth this problem and concludes that in a lot of cases it is indeed possible to visualize filter-like features for units from 2nd and 3rd layers.

Here's a summary of stuff that's ongoing, planned or simply very optimistic:

Collective learning is an idea proposed by Yoshua Bengio. I'm interested in applying it to a large collection of neural networks trained in parallel. In a nutshell, we think that networks that are trained in parallel and that communicate with each other can potentially explore the parameter space much more efficiently than a single network could ever do.
Multimodal learning with deep architectures: how do we train a deep network on text and image data? More generally, how do we train a model on 1 or more views of the same concept? The training procedure must be flexible to accommodate missing views at test and training time.
Extremely sparse and overcomplete representations: we would like to have something like "semantic hashing" for image data, but very sparse and overcomplete.

The ultimate goal is to write a thesis (hopefully by the end of 2010) and the plan is to wrap these ideas into one coherent story.

It is perhaps instructive to know what I don't do: robots! But I do find that robots are cool.

Dumitru Erhan

What I do