Conférence - Statlearn: Challenging problems in statistical learning

This year, the workshop is preceded by a 1-day spring school on April, 6th : 3 tutorials on "Statistical learning and non-vectorial data" are proposed. The number of participants is limited to 50 and these tutorials are mainly (but not exclusively) intended for PhD candidates and young researchers. The program is the following:

Wednesday, 6th of avril 2016

10h-12h Tutorial 1: Statistical learning on graphs, P. Latouche, univ. Paris 1 / SAMM

Graphs are highly used to characterize relationships between objects of interest. They are based on a straighforward formalism but can yet describe complex systems of interactions. In the private sector, the development of Facebook and Twitter has brought light to the so-called social graphs. The terms "graph" and "network" are synonym although in practice "graph" usually refers to the mathematical object while the term "network" is used when considering data from real world applications. Since the work of Moreno in the 30s, a lot of approches have been proposed in order to extract information from these structured data sets. Given a network, the goal consists in uncovering groups of nodes sharing homogenous connection patterns. Original methods appeared in social sciences. Then, in the 90s researchers from physics started working in the field. In particular, the modularity score was introduced. Since the 80s, statisticians have also derived their own approaches with the development of the stochastic block model which did influence the entire community and is now used highly used in practice. In this course, I will review the modularity based methods. I will also cover random graph models and focus on the latent position cluster model and the stochastic block model. Finally, inference techniques for these models, to do the clustering and the estimation of the parameters, will be investigated. In particular, I will talk about the variational expectation maximization (VEM) algorithm, the variational Bayes EM algorithm, Gibbs sampling, greedy search, and model selection. Examples (in R) of studies of real networks will be provided and R packages will be presented. Therefore, we encourage people to bring their own computers with a version of R and the following packages "igraph", "sna", "latentnet", "mixer" installed.

14h-16h Tutorial 2: Statistical learning, functional data and time series , J. Jacques, univ. Lyon / ERIC

If functional data was for longtime inaccessible for statistics (because of technological limitations), today it becomes more and more easy to observe, to store and to process large amounts of such data in medicine, economics, chemometrics and many others domains. In this tutorial, the main tools for dealing with functional data will be presented and applied using the R software. The outline of the tutorial is : 1. data pretreatment (from discrete observation to functional data), 2. descriptive statistics (functional PCA, clustering), 3. predictive models (regression, classification). People are encouraged to bring their own computers with a version of R and the following packages “fda”, “funFEM” installed.

16h-18h Tutorial 3: Topic models , J. Velcin, univ. Lyon / ERIC

Topic models constitue a major tool for dealing with textual data and beyond (e.g., images, meta-data, etc.). They are used by many different communities, from data mining to social sciences and humanities. The popular Latent Dirichlet Allocation (LDA) model is integrated into different platforms and programming languages (Java, R, Python...), although its theoretical basis and limitations are usually not really appreciated by its users. In this tutorial, I will give an overview of the main applications of topic modeling and focus on various aspects of LDA (probabilistic model, parameter estimation, role of priors, choice of features). During the session, the audience will have the opportunity to test the model on their own textual dataset through the packages "rJava" and "mallet" in R.