Nonparametric Measures of DependenceOrganizers: Bodhi Sen and Tian Zheng

Columbia University’s Department of Statistics is organizing a 3-day workshop and a 1-day conference on Nonparametric Measure of Dependence. This conference is open to the public, but please register here to help us keep track of number.

Workshop

Date: April 28th ~ April 30th, 2014

Location: School of Social Work (SSW) Building, 9th and 10th floors, 1255 Amsterdam Avenue, New York, NY 10027 (map)

Abstract: I will begin with a brief introduction to what can be achieved in hypothesis testing using kernel approaches. The remainder of this lecture will cover fundamentals of RKHS theory, which will be necessary in understanding the next two lectures (and hopefully, in explaining why RKHS theory has been so widely used in machine learning). I will give an overview of the main concepts, and how they interact (positive definiteness, boundedness of the evaluation operator, smoothing via the RKHS norm). Time permitting, I will discuss the Moore-Aronsajn theorem, which uniquely associates each RKHS with a positive definite kernel.

Abstract: I will describe how probability measures may be embedded into an RKHS, and the pseudo-metric induced by these embeddings. This is shown to be a metric, and the embedding is injective, when the kernel is characteristic. I will give a number of conditions that can be used to prove a kernel has the characteristic property, with emphasis a simple Fourier argument. I will cover hypothesis testing, including a study of local departures from the null, and a discussion of optimal kernel choice. The kernel distances may be situated in relation to other metrics used in the fields of statistics and probability: I will describe how they relate to L2 distances between Parzen window estimates, integral probability metrics, and energy distances.

Abstract: I will approach dependence via two routes: as the distance between embeddings of a joint probability and the product of the marginals, and in terms of a covariance operator between mappings of the random variables to reproducing kernel Hilbert spaces. The latter gives a simpler interpretation, and relates more clearly to classical measures of dependence. As with energy distances, the Hilbert-Schmidt norm of the covariance operator can be interpreted as a distance covariance. A more powerful test of dependence can be obtained by replacing covariance operators with correlation operators, giving as one measure the kernel canonical correlation. Interestingly, correlation operators can be used to obtain an estimate of the chi-squared statistic which is asymptotically independent of the RKHS. Time permitting, I may cover some more advanced topics of work published in the last year (e.g. detecting interactions between triplets of variables, testing for dependence between time series,...)

Abstract: From correlation (Galton/Pearson, 1895) to distance correlation (Szekely, 2005). Important measures of dependences and how to classify them via invariances. Distance correlation t-test of independence. Open problems for big data.

Tuesday (29 April): 2:00 pm -- 4:00 pm (1025 SSW)

Lecture 2. Energy Statistics (E-statistics) and their Application

Abstract: The statistical energy background of distance correlation and other applications of statistical energy: testing for symmetry, testing for normality, DISCO analysis, energy clustering, etc. A simple inequality on energy statistics and a beautiful theorem of Fourier transforms. What makes a statistic U (or V)?

Abstract: Correlation with respect to stochastic processes. Distances and negative definite functions. Physics principles in statistics (the uncertainty principle of statistics, symmetries/invariances, equilibrium estimates). CLT for dependent variables via Brownian correlation. What if the sample is not iid, what if the sample comes from a stochastic process?

Conference

Date: May 2nd, 2014

Location: School of Social Work (SSW) Building, Room 903, 1255 Amsterdam Avenue, New York, NY (map)

Arthur Gretton: Kernel tests of homogeneity, independence, and multi-variable interaction Abstract: We consider three nonparametric hypothesis testing problems: (1) Given samples from distributions p and q, a homogeneity test determines whether to accept or reject p=q; (2) Given a joint distribution p_xy over random variables x and y, an independence test investigates whether p_xy = p_x p_y, (3) Given a joint distribution over several variables, we may test for whether there exist a factorization (e.g., P_xyz = P_xyP_z, or for the case of total independence, P_xyz=P_xP_yP_z). We present nonparametric tests for the three cases above, based on distances between embeddings of probability measures to reproducing kernel Hilbert spaces (RKHS), which constitute the test statistics (e.g., for independence, the distance is between the embedding of the joint, and that of the product of the marginals). The tests benefit from many years of machine research on kernels for various domains, and thus apply to distributions on high dimensional vectors, images, strings, graphs, groups, and semigroups, among others. The energy distance and distance covariance statistics are also shown to fall within the RKHS family, when semimetrics of negative type are used. The final test (3) is of particular interest, as it may be used in detecting cases where two independent causes individually have weak influence on a third dependent variable, but their combined effect has a strong influence, even when these variables have high dimension.

Andrey Feuerverger: On Consistent Nonparametric Tests for Dependence Abstract: Modern applications and current volumes of data require new approaches to the problems of testing for dependence. Such needs arise in financial engineering, copula modeling, and in many other areas where subtle dependence structures may be an issue. Such applications call for tests which have demonstrably high power, and which are consistent against all alternatives to independence. It turns out that tests constructed carefully in the Fourier domain have such desirable properties, and also turn out to have suggestive and unexpectedly interesting functional forms.The emphasis of the talk will be on basic ideas and on how they can be extended to develop tests applicable to diverse dependence testing contexts.

Michael Kosorok: Using Brownian Distance Covariance in Semi-nonparametric Inference Abstract: In this work, we propose two flexible procedures which use Brownian distance covariance for semi-nonparametric hypothesis testing. The first procedure tests the general hypothesis of whether a certain set of covariates is associated with a right censored failure time. The general procedure requires only weak assumptions and does not require estimation of the censoring probability. The second procedure tests adequacy of a semi-nonparametric model in the context of smoothing spline ANOVA (SS-ANOVA). Specifically, the test evaluates whether a given SS-ANOVA model with p variables with main effects and a predefined set of interactions is sufficient or if more terms are needed. The procedure can also test whether any interactions are needed at all. For both procedures, we use model-based permutation and bootstrap approaches to obtain critical values. Theory and simulation studies verify that both procedures preserve type-I error and have good power performance.

Jingyi(Jessica) Li: A New Statistical Measure for Identifying Sparse Non-functional Relationships between Pairwise Variables Abstract: In genomic research, statistical measures of associations serve as important tools for screening pairwise variables (e.g. genes) that exhibit specific relationships among thousands of variable pairs. Examples of classic association measure include Pearson correlation, Spearman correlation, and maximal correlation, each of which can identify linear relationships, monotone relationships, and functional relationships respectively, in increasing order of generalizaiton. While these measures have demonstrated great power in screening pairwise variables in many research settings, there remain some sparse non-functional relationships (i.e., a mixture of a small number of functional relationships) that may also be of interest in some settings. In this talk, I will present an ongoing work on the development of a new statistical measure for identifying certain types of sparse non-functional relationships between pairwise variables. The new measure is based on a generalized definition of conditional expectation and can be regarded as an extension of the classic coefficient of determination. We propose an estimator of this new measure under a combination of local regression and clustering frameworks. Consistency of this estimator is established. Simulation and real data studies demonstrate the effectiveness of this new measure in identifying different types of sparse non-functional relationships.

Shaw-Hwa Lo: Discovering Influential Variables followed by Interaction-based learning: A Partition Retention (PR) Approach Abstract: We consider a computer intensive approach (PR, 09), based on an earlier method (Lo and Zheng (2002)) for detecting which, of many potential explanatory variables, have an influence on a dependent variable Y. This approach is suited to detect influential variables in groups, where causal effects depend on the confluence of values of several variables. It has the advantage of avoiding a difficult direct analysis, involving possibly thousands of variables, guided by a measure of influence I. At this stage the objective is to discover those influential variables (in groups),we are confining our attention to locating a few needles in a haystack. After that, to deal with challenging real data applications, typically involving complex and extremely high dimensional data, we shall introduce an interaction-based feature selection and a prediction procedure, using a breast cancer gene expression data as an illustrated example. The quality of variables selected is evaluated in two ways: first by classification error rates, then by functional relevance using external biological knowledge. We demonstrate that (1) the classification error rates can be significantly reduced; (2) incorporating interaction information into data analysis can be very rewarding in generating novel scientific findings and models. If time permits, a heuristic explanation why and when the proposed methods may lead to such a dramatic (classification/ predictive) gain is discussed.

Subhadeep Mukhopadhyay: LP Nonparametric Dependence Modeling Abstract: The goal of this talk is to discuss a recent innovation in nonparametric depended modeling that we (Deep and Parzen) have been developing. Several applications of this theory will be presented to show how LP theory permits to design ''Single" General Algorithm that can simultaneously tackle different varieties of data types and data patterns. The connection with traditional and novel statistical methods will be mentioned. This is a joint work with Emanuel Parzen.

David Reshef: Equitability and the Maximal Information Coefficient Abstract: The maximal information coefficient (MIC) is a measure of dependence for finding the strongest pairwise relationships in a data set with many variables. MIC is useful not just for identifying deviations from statistical independence but also for the more delicate task of ranking relationships by strength, as it gives similar scores to equally noisy relationships of different types. This property, called equitability, is important when the goal is to identify a relatively small set of strongest associations as opposed to as many non-trivial associations as possible, which are often too many to sift through. In this talk, we formally define equitability, as well as redefine MIC in a parameter estimation framework and introduce new algorithms for estimating it. We also present an extensive comparison of state-of-the-art measures of dependence together with a discussion of tradeoffs to consider in choosing an appropriate measure of dependence in various settings.

Bharath Sriperumbudur: Density Estimation in Infinite Dimensional Exponential Families Abstract: In this work, we consider the problem of estimating densities in an infinite dimensional exponential family indexed by functions in a reproducing kernel Hilbert space. Since standard techniques like maximum likelihood estimation (MLE) or pseudo MLE (based on the method of sieves) do not yield practically useful estimators, we propose an estimator based on the minimization of Fisher divergence, which involves solving a simple linear system. We show that the proposed estimator is consistent, and provide convergence rates under smoothness assumptions (precisely, under the assumption that the true parameter or function that generates the data generating distribution lies in the image of a certain covariance operator). We also empirically demonstrate that the proposed method outperforms the standard non-parametric kernel density estimator. Joint work with Kenji Fukumizu, Arthur Gretton and Aapo Hyvarinen.

Gabor Szekely: Partial Distance Correlation Abstract: What makes partial distance correlation difficult to define? Distance correlation and dissimilarities via unbiased distance covariance estimates. My Erlangen program in statistics. An important equality: what is wrong with the Mantel test? Variable selection via partial distance correlation. Unsolved problems when 0 < dcor < 1. What is a good measure of dependence? How strong the dependence of uncorrelated variables can be? Why not maximal correlation? Dependence and complexity. Why distance correlation?

Nonparametric Measures of DependenceOrganizers: Bodhi Sen and Tian ZhengColumbia University’s Department of Statistics is organizing a 3-day workshop and a 1-day conference on Nonparametric Measure of Dependence. This conference is open to the public, but

please register hereto help us keep track of number.WorkshopDate: April 28th ~ April 30th, 2014Location: School of Social Work (SSW) Building, 9th and 10th floors, 1255 Amsterdam Avenue, New York, NY 10027 (map)Instructor:Schedule:Arthur Gretton -- Kernel Methods for Comparing Distributions and Detecting DependenceLecture 1: Fundamentals of RKHScolumbia14_lecture1.pdf

- Details
- Download
- 1 MB

Abstract:I will begin with a brief introduction to what can be achieved in hypothesis testing using kernel approaches. The remainder of this lecture will cover fundamentals of RKHS theory, which will be necessary in understanding the next two lectures (and hopefully, in explaining why RKHS theory has been so widely used in machine learning). I will give an overview of the main concepts, and how they interact (positive definiteness, boundedness of the evaluation operator, smoothing via the RKHS norm). Time permitting, I will discuss the Moore-Aronsajn theorem, which uniquely associates each RKHS with a positive definite kernel.Lecture 2: Embeddings of probability measures into an RKHScolumbia14_2.pdf

- Details
- Download
- 2 MB

Abstract:I will describe how probability measures may be embedded into an RKHS, and the pseudo-metric induced by these embeddings. This is shown to be a metric, and the embedding is injective, when the kernel is characteristic. I will give a number of conditions that can be used to prove a kernel has the characteristic property, with emphasis a simple Fourier argument. I will cover hypothesis testing, including a study of local departures from the null, and a discussion of optimal kernel choice. The kernel distances may be situated in relation to other metrics used in the fields of statistics and probability: I will describe how they relate to L2 distances between Parzen window estimates, integral probability metrics, and energy distances.Lecture 3: Dependence measures using RKHS embeddingsArthurPresent_Lecture3.pdf

- Details
- Download
- 4 MB

Abstract:I will approach dependence via two routes: as the distance between embeddings of a joint probability and the product of the marginals, and in terms of a covariance operator between mappings of the random variables to reproducing kernel Hilbert spaces. The latter gives a simpler interpretation, and relates more clearly to classical measures of dependence. As with energy distances, the Hilbert-Schmidt norm of the covariance operator can be interpreted as a distance covariance. A more powerful test of dependence can be obtained by replacing covariance operators with correlation operators, giving as one measure the kernel canonical correlation. Interestingly, correlation operators can be used to obtain an estimate of the chi-squared statistic which is asymptotically independent of the RKHS. Time permitting, I may cover some more advanced topics of work published in the last year (e.g. detecting interactions between triplets of variables, testing for dependence between time series,...)Gabor Szekely -- Brownian Distance Covariance and Energy StatisticsSzekely Columbia Workshop.ppt

- Details
- Download
- 965 KB

Lecture 1. Distance CorrelationAbstract:From correlation (Galton/Pearson, 1895) to distance correlation (Szekely, 2005). Important measures of dependences and how to classify them via invariances. Distance correlation t-test of independence. Open problems for big data.Lecture 2. Energy Statistics (E-statistics) and their ApplicationAbstract:The statistical energy background of distance correlation and other applications of statistical energy: testing for symmetry, testing for normality, DISCO analysis, energy clustering, etc. A simple inequality on energy statistics and a beautiful theorem of Fourier transforms. What makes a statistic U (or V)?Lecture 3. Brownian CorrelationAbstract:Correlation with respect to stochastic processes. Distances and negative definite functions. Physics principles in statistics (the uncertainty principle of statistics, symmetries/invariances, equilibrium estimates). CLT for dependent variables via Brownian correlation. What if the sample is not iid, what if the sample comes from a stochastic process?ConferenceDate:May 2nd, 2014Location:School of Social Work (SSW) Building, Room 903, 1255 Amsterdam Avenue, New York, NY (map)Speakers:Schedule:09:00am -- 09:30am Opening remarks and breakfast09:30am -- 11:45am Session I01:15pm -- 03:30pm Session II: Junior Researchers Session04:00pm -- 05:30pm Session IIIEnd of term party and dinnerTitles and Abstract:Arthur Gretton: Kernel tests of homogeneity, independence, and multi-variable interactionAbstract: We consider three nonparametric hypothesis testing problems: (1) Given samples from distributions p and q, a homogeneity test determines whether to accept or reject p=q; (2) Given a joint distribution p_xy over random variables x and y, an independence test investigates whether p_xy = p_x p_y, (3) Given a joint distribution over several variables, we may test for whether there exist a factorization (e.g., P_xyz = P_xyP_z, or for the case of total independence, P_xyz=P_xP_yP_z). We present nonparametric tests for the three cases above, based on distances between embeddings of probability measures to reproducing kernel Hilbert spaces (RKHS), which constitute the test statistics (e.g., for independence, the distance is between the embedding of the joint, and that of the product of the marginals). The tests benefit from many years of machine research on kernels for various domains, and thus apply to distributions on high dimensional vectors, images, strings, graphs, groups, and semigroups, among others. The energy distance and distance covariance statistics are also shown to fall within the RKHS family, when semimetrics of negative type are used. The final test (3) is of particular interest, as it may be used in detecting cases where two independent causes individually have weak influence on a third dependent variable, but their combined effect has a strong influence, even when these variables have high dimension.

Andrey Feuerverger: On Consistent Nonparametric Tests for DependenceAbstract: Modern applications and current volumes of data require new approaches to the problems of testing for dependence. Such needs arise in financial engineering, copula modeling, and in many other areas where subtle dependence structures may be an issue. Such applications call for tests which have demonstrably high power, and which are consistent against all alternatives to independence. It turns out that tests constructed carefully in the Fourier domain have such desirable properties, and also turn out to have suggestive and unexpectedly interesting functional forms.The emphasis of the talk will be on basic ideas and on how they can be extended to develop tests applicable to diverse dependence testing contexts.Michael Kosorok: Using Brownian Distance Covariance in Semi-nonparametric InferenceAbstract: In this work, we propose two flexible procedures which use Brownian distance covariance for semi-nonparametric hypothesis testing. The first procedure tests the general hypothesis of whether a certain set of covariates is associated with a right censored failure time. The general procedure requires only weak assumptions and does not require estimation of the censoring probability. The second procedure tests adequacy of a semi-nonparametric model in the context of smoothing spline ANOVA (SS-ANOVA). Specifically, the test evaluates whether a given SS-ANOVA model with p variables with main effects and a predefined set of interactions is sufficient or if more terms are needed. The procedure can also test whether any interactions are needed at all. For both procedures, we use model-based permutation and bootstrap approaches to obtain critical values. Theory and simulation studies verify that both procedures preserve type-I error and have good power performance.

Jingyi(Jessica) Li: A New Statistical Measure for Identifying Sparse Non-functional Relationships between Pairwise VariablesAbstract: In genomic research, statistical measures of associations serve as important tools for screening pairwise variables (e.g. genes) that exhibit specific relationships among thousands of variable pairs. Examples of classic association measure include Pearson correlation, Spearman correlation, and maximal correlation, each of which can identify linear relationships, monotone relationships, and functional relationships respectively, in increasing order of generalizaiton. While these measures have demonstrated great power in screening pairwise variables in many research settings, there remain some sparse non-functional relationships (i.e., a mixture of a small number of functional relationships) that may also be of interest in some settings. In this talk, I will present an ongoing work on the development of a new statistical measure for identifying certain types of sparse non-functional relationships between pairwise variables. The new measure is based on a generalized definition of conditional expectation and can be regarded as an extension of the classic coefficient of determination. We propose an estimator of this new measure under a combination of local regression and clustering frameworks. Consistency of this estimator is established. Simulation and real data studies demonstrate the effectiveness of this new measure in identifying different types of sparse non-functional relationships.

Shaw-Hwa Lo: Discovering Influential Variables followed by Interaction-based learning: A Partition Retention (PR) ApproachAbstract: We consider a computer intensive approach (PR, 09), based on an earlier method (Lo and Zheng (2002)) for detecting which, of many potential explanatory variables, have an influence on a dependent variable Y. This approach is suited to detect influential variables in groups, where causal effects depend on the confluence of values of several variables. It has the advantage of avoiding a difficult direct analysis, involving possibly thousands of variables, guided by a measure of influence I. At this stage the objective is to discover those influential variables (in groups),we are confining our attention to locating a few needles in a haystack. After that, to deal with challenging real data applications, typically involving complex and extremely high dimensional data, we shall introduce an interaction-based feature selection and a prediction procedure, using a breast cancer gene expression data as an illustrated example. The quality of variables selected is evaluated in two ways: first by classification error rates, then by functional relevance using external biological knowledge. We demonstrate that (1) the classification error rates can be significantly reduced; (2) incorporating interaction information into data analysis can be very rewarding in generating novel scientific findings and models. If time permits, a heuristic explanation why and when the proposed methods may lead to such a dramatic (classification/ predictive) gain is discussed.

Subhadeep Mukhopadhyay: LP Nonparametric Dependence ModelingAbstract: The goal of this talk is to discuss a recent innovation in nonparametric depended modeling that we (Deep and Parzen) have been developing. Several applications of this theory will be presented to show how LP theory permits to design ''Single" General Algorithm that can simultaneously tackle different varieties of data types and data patterns. The connection with traditional and novel statistical methods will be mentioned. This is a joint work with Emanuel Parzen.

David Reshef: Equitability and the Maximal Information CoefficientAbstract: The maximal information coefficient (MIC) is a measure of dependence for finding the strongest pairwise relationships in a data set with many variables. MIC is useful not just for identifying deviations from statistical independence but also for the more delicate task of ranking relationships by strength, as it gives similar scores to equally noisy relationships of different types. This property, called equitability, is important when the goal is to identify a relatively small set of strongest associations as opposed to as many non-trivial associations as possible, which are often too many to sift through. In this talk, we formally define equitability, as well as redefine MIC in a parameter estimation framework and introduce new algorithms for estimating it. We also present an extensive comparison of state-of-the-art measures of dependence together with a discussion of tradeoffs to consider in choosing an appropriate measure of dependence in various settings.

Bharath Sriperumbudur: Density Estimation in Infinite Dimensional Exponential FamiliesAbstract: In this work, we consider the problem of estimating densities in an infinite dimensional exponential family indexed by functions in a reproducing kernel Hilbert space. Since standard techniques like maximum likelihood estimation (MLE) or pseudo MLE (based on the method of sieves) do not yield practically useful estimators, we propose an estimator based on the minimization of Fisher divergence, which involves solving a simple linear system. We show that the proposed estimator is consistent, and provide convergence rates under smoothness assumptions (precisely, under the assumption that the true parameter or function that generates the data generating distribution lies in the image of a certain covariance operator). We also empirically demonstrate that the proposed method outperforms the standard non-parametric kernel density estimator. Joint work with Kenji Fukumizu, Arthur Gretton and Aapo Hyvarinen.

Gabor Szekely: Partial Distance CorrelationAbstract: What makes partial distance correlation difficult to define? Distance correlation and dissimilarities via unbiased distance covariance estimates. My Erlangen program in statistics. An important equality: what is wrong with the Mantel test? Variable selection via partial distance correlation. Unsolved problems when 0 < dcor < 1. What is a good measure of dependence? How strong the dependence of uncorrelated variables can be? Why not maximal correlation? Dependence and complexity. Why distance correlation?