Project 1 - Predicting Exoplanets

David Kinney - DSS 680 - Spring 2021 - Professor Catherine Williams

clean data

# Create subsets based on category transit_columns = ['koi_period', 'koi_time0bk', 'koi_time0', 'koi_impact', 'koi_duration', 'koi_depth', 'koi_ror', 'koi_srho', 'koi_fittype', 'koi_prad', 'koi_sma', 'koi_incl', 'koi_teq', 'koi_insol', 'koi_dor', 'koi_limbdark_mod', 'koi_ldm_coeff2', 'koi_ldm_coeff1', 'koi_parm_prov'] tce_columns = ['koi_max_sngle_ev', 'koi_max_mult_ev', 'koi_model_snr', 'koi_count', 'koi_num_transits', 'koi_tce_plnt_num', 'koi_tce_delivname', 'koi_quarters', 'koi_trans_mod', 'koi_datalink_dvr', 'koi_datalink_dvs'] stellar_columns = ['koi_steff', 'koi_slogg', 'koi_smet', 'koi_srad', 'koi_smass', 'koi_sparprov'] kic_columns = ['ra', 'dec', 'koi_kepmag', 'koi_gmag', 'koi_rmag', 'koi_imag', 'koi_zmag', 'koi_jmag', 'koi_hmag', 'koi_kmag'] pixel_columns = ['koi_fwm_sra', 'koi_fwm_sdec', 'koi_fwm_srao', 'koi_fwm_sdeco', 'koi_fwm_prao', 'koi_fwm_pdeco', 'koi_fwm_stat_sig', 'koi_dicco_mra', 'koi_dicco_mdec', 'koi_dicco_msky', 'koi_dikco_mra', 'koi_dikco_mdec', 'koi_dikco_msky'] df_transit = df_koi_cleaned[transit_columns] df_tce = df_koi_cleaned[tce_columns] df_stellar = df_koi_cleaned[stellar_columns] df_kic = df_koi_cleaned[kic_columns] df_pixel = df_koi_cleaned[pixel_columns]# %% pandas profiling # Suppress SettingWithCopyWarning message generated by pandas-profiling import warnings warnings.simplefilter(action='ignore') profile = ProfileReport(df_koi_cleaned, title="Pandas Profiling Report") # profile = ProfileReport(df, title='Pandas Profiling Report', explorative=True) pfile = "profile_report_{}.html".format(datetime.now().strftime('%m%d%y%H%M')) profile.to_file(pfile)

prepare data

Dimensionality Reduction

# %% correlation matrix after dimentionality reduction rcParams['figure.figsize'] = 20, 14 plt.matshow(df_features.corr()) plt.yticks(np.arange(df_features.shape[1]), df_features.columns) plt.xticks(np.arange(df_features.shape[1]), df_features.columns, rotation='vertical') plt.colorbar()

Pardon the interruption...

pycaret

train baseline model

AutoML

confusion matrix

Random search of parameters, using 3 fold cross validation, search across 100 different combinations, and use all available cores

Model with Random Search CV Params

TPOT is an open-source library for performing AutoML in Python. It makes use of the popular Scikit-Learn machine learning library for data transforms and machine learning algorithms and uses a Genetic Programming stochastic global search procedure to efficiently discover a top-performing model pipeline for a given dataset. 1

AdaBoost Classifier

SVM

XGBoost

Multi-classification using Keras

**I did not get this to run, but is definitely something I'd like to revisit once the semester is over.**