Work detail

Estimating performance of classifiers from dataset properties

Author: Mgr. Michal Todt
Year: 2018 - summer
Leaders: Mgr. Petr Polák MSc. Ph.D.
Consultants:
Work type: Economic Theory
Masters
Language: English
Pages: 108
Awards and prizes:
Link: https://is.cuni.cz/webapps/zzp/detail/191854/
Abstract: The following thesis explores the impact of the dataset distributional prop- erties on classification performance. We use Gaussian copulas to generate 1000 artificial dataset and train classifiers on them. We train Generalized linear models, Distributed Random forest, Extremely randomized trees and Gradient boosting machines via H2O.ai machine learning platform accessed by R. Classi- fication performance on these datasets is evaluated and empirical observations on influence are presented. Secondly, we use real Australian credit dataset and predict which classifier is possibly going to work best. The predicted perfor- mance for any individual method is based on penalizing the differences between the Australian dataset and artificial datasets where the method performed com- paratively better, but it failed to predict correctly.

01

December

December 2022
MonTueWedThuFriSatSun
   1234
567891011
12131415161718
19202122232425
262728293031 

Partners

Deloitte

Sponsors

CRIF
McKinsey
Patria Finance
Česká Spořitelna
EY