nav emailalert searchbtn searchbox tablepage yinyongbenwen piczone journalimg journalInfo journalinfonormal searchdiv searchzone qikanlogo popupnotification paper paperNew
2014, 01, v.3 1-12
统计相关性分析方法研究进展
基金项目(Foundation):
邮箱(Email):
DOI:
摘要:

系统综述了自19世纪开始至今常用的统计相关性的方法,例如Pearson和Spearman相关系数,CorGc和CovGc相关性及距离相关性方法。重点介绍了2011年提出的MIC方法以及由此引发的毁誉参半的大量评述,旨在揭示这一热点领域的研究面貌。该领域不仅受到统计学家的关注,而且受到了分析大样本和异质数据的应用研究领域的学者们的追捧,例如基因组生物学家和网络信息研究者。这些研究者期望在众多已有方法的理解和剖析中更恰当地付诸应用,并提出新的应用问题来推动新的分析方法的创造。

Abstract:

Correlation analysis is a major research topic in both theoretical statistical study and practical applications.It has been paid more and more attention as the amount of data is increasing significantly.This article reviews several methods that are commonly used,including the Pearson correlation and Spearman correlation developed in 19(upth)century and CorGc and CovGc introduced in 21(upst)century etc.In particular,we include MIC that was proposed in 2011 and its positive and negative comments,aiming at sketching the whole research topic.Methods of correlation analysis themselves play a key role in statistics,especially in analyzing large heterogeneity datasets,such as complex information networks and genome-proteome datasets.This survey tries to provide some understanding of existing methods and their applications.We hope to encourage some new applications,which in turn may promote some new methods developing.

参考文献

[1]Moon Y I,Rajagopalan B,Lall U.Estimation of mutual information using kernel density estimators[J].Physical Review E,1995,52(3):2318-2321.

[2]Darbellay G,Vajda I.Estimation of the Information by an adaptive partitioning of the observation space[J].IEEE Transaction Information,1999,45(4):1315-1321.

[3]Kraskov A,Stgbauer H,Grassberger P.Estimating mutual information[J/OL].(2004-06-23)[2014-01-14].Physical Review E,2004.http://journals.aps.org/pre/abstract/10.1103/Phys Rev E.69.066138.

[4]Rényi A.On measures of dependence[J].Acta Mathematica Academiae Scientiarum Hungarica,1959,10(3/4):441-451.

[5]Breiman L,Friedman J H.Estimating optimal transformations for multiple regression and correlation[J].Journal of the American Statistical Association,1985,80(391):580-598.

[6]Hastie T,Stuetzle W.Principal curves[J].Journal of the American Statistical Association,1989,84(406):502-516.

[7]Tibshirani R.Principal curves revisited[J].Statistics and Computing,1992,2(4):183-190.

[8]Kégl B,Krzyzak A,Linder T,et al.A polygonal line algorithm for constructing principal curves[C]∥Advances in Neural Information Processing Systems.Cambridge:MIT Press,1999(11):501-507.

[9]Delicado P,Smrekar M.Measuring non-linear dependence for two random variables distributed along a curve[J].Statistics and Computing,2009(19):255-269.

[10]Delicado P.Another look at principal curves and surfaces[J].Journal of Multivariate Analysis,2001(77):84-116.

[11]Reshef D N,Reshef Y A,Finucane H K,et al.Detecting novel associations in large data sets[J].Science,2011(334):1518-1524.

[12]Speed T.A correlation for the 21(upst)century[J].Science,2011(334):1502-1503.

[13]Bell C.Mutual information and maximal correlation as measures of dependence[J].The Annals of Mathematical Statistics,1962(33):587-595.

[14]Schweizer B,Wolff E F.On nonparametric measures of dependence for random variables[J].The Annals of Mathematical Statistics,1981,9(4):879-885.

[15]Granger C W,Massoumi E,Racine J.A dependence metric for possibly nonlinear processes[J].Journal of the American Statistical Association,1989(84):502-516.

[16]Nelsen R B.An introduction to copulas,2(upnd)edn,spring series in statistics[M].New York:Springer,2006.

[17]Shannon C E,Weaver W.The mathematical theory of communication[M].Champaign:University of Illinois Press,1949.

[18]Bjerve S,Doksum K.Correlation curves:measures of association as functions of covariate value[J].The Annals of Mathematical Statistics,1993(21):890-902.

[19]Galton F.Regression towards mediocrity in hereditary stature[J].Journal of the Anthropological Institute,1885(15):246-263.

[20]Pearson K.Notes on the history of correlation[J].Biometrika,1920(13):25-45.

[21]Rodgers J L,Nicewander W A.Thirteen ways to look at the correlation coefficient[J].The American Statistician,1988,42(1):59-66.

[22]Wikimedia.File:Spearman fig1.svg[EB/OL].[2014-01-14].http://commons.wikimedia.org/wiki/File:Spearman_fig1.svg?uselang=zh-cn.

[23]Kendall M G.A new measure of rank correlation[J].Biometrika,1938(30):81-93.

[24]Hirschfeld H O.A connection between correlation and contingency[J].Proceedings of the Cambridge Philosophical Society,1935,31(4):520-524.

[25]Gebelein H.Das statistische problem der korrelationals variations-und eigenwert problem undsein Zusammenhangmit der Ausgleichsrechnung[J].Journal of Applied Mathematics and Mechanics,1941,21(6):364-379.

[26]Breiman L,Friedman J.Estimating optimal transformations for multiple regression andcorrelation(with discussion)[J].Journal of the American Statistical Association,1985(80):580-619.

[27]Sethuraman J.The asymptotic distribution of the renyi maximal correlation[J].Communications in Statistics,Theory Method,1990,19(11):4291-4298.

[28]Dembo A,Kagan A,Shepp L A.Remarks on the maximum correlation coefficient[J].Bernoulli,2001,7(2):343-350.

[29]Czaki P,Fisher J.On the general notion of maximum correlation[J].Publ.Math.Inst.Hung.Acad.Sci,1963(8):27-51.

[30]Walters-Williams J.Estimation of mutual information:A survey[J].Lecture Notes in Computer Science,2009(5589):389-396.

[31]Delicado P,Smrekar M.Measuring non-lineat dependence for two random variables distributed along a curve[J].Statistics and Computing,2009(19):255-269.

[32]Hastie T,Stuetzle W.Principal curves[J].Journal of the American Statistical Association,1989(84):502-516.

[33]Kegl B.Learning and design of principal curves[J].IEEE Trans,Pattern Analysis and Machine Intelligence,2000(22):281-297.

[34]Delicado P.Another look at principal curves and surfaces[J].Journal of Multivariate Analysis,2001(77):84-116.

[35]Szekely G,Rizzo M.Measuring and testing independence by correlation distances[J].The Annals of Statistics,2007(35):2769-2794.

[36]Diaconis P,Efron B.Computer-intensive methods in statistics[J].Scientific American,1983(248):116-129.

[37]Gorfine M,Heller R,Heller Y.Comment on‘Detecting Novel Associations in Large Data Sets[EB/OL].[2014-01-14].http://www.math.tau.ac.il/~ruheller/Papers/science6.pdf.

[38]Simon N,Tibshirani R.Comment on‘Detecting novel associations in large data sets’by Reshef et,al,Science,Dec 16,2011[EB/OL].[2014-01-14].http://statweb.stanford.edu/~tibs/reshef/comment.pdf.

[39]Kinney J B,Atwal G S.Equitability,mutual information and the maximal information coefficient[J].Proceedings of the National Academy of Sciences,2014,111(9):3354-3359.

[40]Heller R,Hellere Y,Gorfine M.A consistent multivariate test of association based on ranks of distances[EB/OL].[2014-01-14].http://xxx.tau.ac.il/pdf/1201.3522v3.pdf.

基本信息:

中图分类号:O212.1

引用信息:

[1]樊嵘,孟大志,徐大舜.统计相关性分析方法研究进展[J].数学建模及其应用,2014,3(01):1-12.

发布时间:

2014-02-15

出版时间:

2014-02-15

引用

GB/T 7714-2015 格式引文
MLA格式引文
APA格式引文