- Tytuł:
- Controlling the effect of multiple testing in Big Data
- Autorzy:
- Denkowska, Sabina
- Powiązania:
- https://bibliotekanauki.pl/articles/585035.pdf
- Data publikacji:
- 2014
- Wydawca:
- Wydawnictwo Uniwersytetu Ekonomicznego we Wrocławiu
- Tematy:
-
multiple testing
FDR
Big Data - Opis:
- Big Data poses a new challenge to statistical data analysis. An enormous growth of available data and their multidimensionality challenge the usefulness of classical methods of analysis. One of the most important stages in Big Data analysis is the verification of hypotheses and conclusions. With the growth of the number of hypotheses, each of which is tested at significance level, the risk of erroneous rejections of true null hypotheses increases. Big Data analysts often deal with sets consisting of thousands, or even hundreds of thousands of inferences. FWER-controlling procedures recommended by Tukey [1953], are effective only for small families of inferences. In cases of numerous families of inferences in Big Data analyses it is better to control FDR, that is the expected value of the fraction of erroneous rejections out of all rejections. The paper presents marginal procedures of multiple testing which allow for controlling FDR as well as their interesting alternative, that is the joint procedure of multiple testing MTP based on resampling [Dudoit, van der Laan 2008]. A wide range of applications, the possibility of choosing the Type I error rate and easily accessible software (MTP procedure is implemented in R multtest package) are their obvious advantages. Unfortunately, the results of the analysis of the MTP procedure obtained by Werft and Benner [2009] revealed problems with controlling FDR in the case of numerous sets of hypotheses and small samples. The paper presents a simulation experiment conducted to investigate potential restrictions of MTP procedure in case of large numbers of inferences and large sample sizes, which is typical of Big Data analyses. The experiment revealed that, regardless of the sample size, problems with controlling FDR occur when multiple testing procedures based on minima of unadjusted p-values ( ) are applied. Moreover, the experiment indicated the serious instability of the results of the MTP procedure (dependent on the number of bootstrap samplings) if multiple testing procedures based on minima of unadjusted p-values ( ) are used. The experiment described in the paper and the results obtained by Werft, Benner [2009] and Denkowska [2013] indicate the need for further research on MTP procedure.
- Źródło:
-
Mathematical Economics; 2014, 10(17); 5-16
1733-9707 - Pojawia się w:
- Mathematical Economics
- Dostawca treści:
- Biblioteka Nauki