Masters Theses

Masters Students

Barbora Pisecka (Fall 2022) pdf
Keywords: Aalbers Phone Use ESM Dataset, Predicting procrastination with phone use features

Abstract: As procrastination rates keep growing, researchers strive to understand its underlying mechanisms. Such understanding could help to prevent and thus limit this behaviour and its adverse effects. Multiple authors have previously investigated the relationship between smartphone use and procrastination, however, most earlier studies on procrastination relied on surveys to capture smartphone use. This method has been questioned as it rarely reflects user smartphone behaviour accurately. Thus, this thesis implemented a different approach and used smartphone logs instead. While some other authors also used phone logs for prediction of procrastination, no one has yet explored how sequential patterns of smartphone use might contribute to the prediction. Sequential patterns have previously proven to be useful for the prediction of other psychological states, such as mood or emotions. Using a dataset that consisted of smartphone logs of 231 users, this thesis tested how well sequential and non-sequential features can predict daily procrastination. This was observed for three different classifiers, namely Decision Tree, Random Forest, and XGBoost. The best performance was achieved by a combination of sequential and non-sequential features and the XGBoost classifier. Thus, evidence was found that the sequential features complement non-sequential features. While certain limitations of the analysis were identified, the results still provide a promising starting point for future research.
Pieter van Brakel (Fall 2022) pdf
Keywords: Predicting fake news, Deep Learning, Public Data

Abstract: Fake news contains false information and can cause harm. Research shows that humans are vulnerable to fake news because of an inadequacy to separate true from fake news. Automatic fake news detection can help in separating true news from fake news. This research compares promising models from the scientific literature to classify fake news articles using news content. Several machine learning and deep learning algorithms will be tested on the WELFake dataset, introduced in (Verma et al., 2021). The Bi-directional RNN-LSTM achieves the highest classification accuracy of 97.19%, higher than the state-of-the-art classifier. Before a fake news detection classifier would be implemented on a larger scale, further research is required.
Sarina Kasiemkhan (Fall 2022) pdf
Keywords: Time-series modeling, Proprietary Data from PostNL

Abstract: This thesis provides insights into the performance of Facebook’s Prophet algorithm, SARIMA, and the moving average model for predicting parcel volume per area per day. In the available literature, these models have not been compared in this specific application before. The dataset used in this study is provided by PostNL and contains multiple time series data from January 2020 until November 2022. The comparisons conducted in this thesis reveal that Prophet generates more accurate predictions as evidenced by the error scores. Specifically, Prophet’s predictions are found to be very close to the actual values, indicating that it captures seasonality well. On the other hand, SARIMA is observed to not handle multiple seasonalities well, as it keeps predicting values close to the mean. Additionally, Prophet is found to adapt well to the lockdown period, indicating that it handles sudden changes well. However, SARIMA seems to be a better model when it comes to generalizing to other areas, as Prophet fits too well on the data it trained on.
Kiki Peeters (Fall 2022) pdf
Keywords: Cross-situational Word Learning Dataset, Predicting learning

Abstract: In artificial language learning, it is important to understand how individual performance can be predicted so that effective language learning methods can be designed. This study aimed to address this gap in the literature by comparing the performance of different models, namely the support vector machine, multilayer perceptron, and decision tree ensemble, on a data set collected by Hendrickson and Perfors (2018) in a study on cross-situational learning in a Zipfian environment. The data set consisted of individual word-object pairs, and the models were trained to predict the accuracy of these pairs. The accuracy is either 0 or 1, making the task for this master thesis a binary classification one. The decision tree ensemble was found to be the best performing model, outperforming the baseline model logistic regression, support vector machine and the multilayer perceptron with an accuracy on the test set of 0.686. In addition, no difference in performance was detected between age categories, youth (17 - 24), adults (25 - 64) and seniors (65+) using the decision tree ensemble. Furthermore, the decision tree ensemble was also used to identify the most important features that influence individual performance in artificial language learning. The features were selected based on the mean accuracy decrease. These features were found to be age, the type of experiment, and response time. In conclusion, this study has demonstrated that it is possible to predict individual performance in artificial language learning using a decision tree ensemble. The results of this study can be used to design more effective language learning methods by taking into account the individual factors that influence performance.
Agata Rapiej (Fall 2022) pdf
Keywords: Aalbers phone use dataset, Predicting Body Dysmorphic Disorder scores with phone use features

Abstract: This research used several binary classification models such as K-Nearest Neighbor, Support Vector Machines, Random Forest, and Logistic Regression to predict Body Dysmorphic Disorder (BDD) based on social media usage. Body Dysmorphic Disorder is a mental disorder that is recognized by disturbing or debilitating concern with minor or imagined flaws in one's physical appearance. Even though there has been a lot of research predicting mental health disorders, the prediction of BDD has been neglected by many researchers, even though the disorder is very prominent among students. The literature used in this research mainly focuses on predicting depression and anxiety using social media, of which many used different datasets and features. This research distinguishes itself by looking at social media usage and its effect on BDD among students, a topic that has not been researched before. The best model to predict the body dysmorphic disorder turned out to be a Random Forest with an accuracy of 55.7% using theory driven features. At the end the Random Forest model was tested on disparate groups (gender) and gave a better accuracy score for women group than men group.
Rowanne Trapmann (Fall 2022) pdf
Keywords: Classifying images, Deep Learning, Adversarial Training, Public Data

Abstract: According to research, neural networks (NNs) and deep neural networks (DNNs) are particularly sensitive to adversarial attacks, which arbitrarily modify the network’s output (Madry, Makelov, Schmidt, Tsipras, & Vladu, 2017). These so-called altered outputs can be of great danger. This study aimed to investigate the impact of data augmentation techniques and adversarial training on the classification accuracy and resistance against adversarial attacks in the recognition of traffic signs. By evaluating the performance of two CNN models, ResNet18 and ResNet50, on a dataset of traffic signs using several data augmentation techniques and adversarial attacks. The results showed that integrating data augmentation methods and adversarial training can improve the robustness of the models against adversarial attacks. The ResNet18 model trained with adversarial training and data augmentation techniques achieved an average attack rate of 0.45 and 0.77 on the DeepFool and I-FGSM attacks, respectively. However, it is also notable that the loss over the epochs was high, indicating that the models may still be vulnerable to other types of attacks. The results show that the models are less resistant to the I-FGSM attack, specifically the ResNet50 model. Additionally, it is noteworthy that the model’s accuracy decreases when data augmentation techniques, which aim to simulate real-world scenarios, are applied. Overall, this research highlights the importance of considering the robustness of models in the context of computer vision tasks and the need for further research to improve the robustness of CNN models against adversarial attacks.
Noor Meijer (Spring 2022) pdf
Keywords: Predicting Airbnb listing price, Public Dataset

Abstract: Airbnb is one of the most popular and fastest growing sharing platforms in the world. However, its offer of peer-to-peer accommodation can make it difficult to properly price listings. This thesis provides new insights in price prediction by adding review information to a price prediction model with standard listing characteristics. Reviews are often overlooked in Airbnb prediction research but offer valuable insights. Using an open-source Airbnb dataset, several review features are mined from a large set of Airbnb listing reviews. The unsupervised learning method topic modelling is applied on reviews and included as predictor, which results in improved predictive performance for listing prices. In addition, (weighted) sentiment analysis features are obtained using VADER and transformed to features. However, they only marginally improve price prediction, due to the skewed distribution of sentiment scores. Both a Support Vector Regression and XGBoost model are a good fit for the Airbnb data, although XGBoost provides the best performance.
Bidus Plomp (Spring 2022) pdf
Keywords: Aalbers phone use dataset, Predicting stress with phone use features

Abstract: Stress levels have been on the rise in recent years. With a proven negative relation between stress and health, action must be taken. Therefore, this study aimed to investigate to what extent it was possible to predict stress levels using passively logged smartphone data. The data used in this study consisted of the smartphone usage logs of 227 students and their responses to a questionnaire about their mental health. The first step in this study was to investigate the best way of representing the smartphone usage data. This was done by testing different feature representations in combination with various machine learning classification models to determine which combination most accurately predicts perceived stress levels. The results showed that the best results were found for the feature representation with time and count per app category and the Random Forest model. Subsequently, it was investigated whether oversampling of the minority class by means of SMOTE, a technique not previously used in the relevant literature, produced better results. The results showed that the use of this technique indeed yielded better results. Furthermore, research that used personal information showed similar outcomes, though it scored slightly lower than the research that also used physiological sensors. In conclusion, this suggests that stress prediction using only smartphone user data did not achieve the same results as the current standard, however a step in the right direction has been made and further research is suggested.
Katharina Pritzl (Spring 2022) pdf
Keywords: Aalbers phone use dataset, Deep Learning, Predicting next application to be used

Abstract: Although prior research has addressed the task of next-app prediction, the area of predicting app engagement is widely unresearched. Since engagement predictions for mobile-phone applications can help to personalize features related to app usage, thus leading to a higher user satisfaction, the present work introduces a deep-learning approach to address this research gap. By proposing a generic LSTM that is capable of including multiple independent app usage sequences, the multiclass classification problem of dwell time prediction is addressed. Not only a solution for the cold-start problem, but also a framework for other areas where vast amounts of independent sequences have to be processed is introduced. Additionally, the tradeoff between users’ privacy, accuracy, and computational efficiency is investigated by comparing the performance of the proposed LSTM and an MLP trained on different feature subsets. As the basis for this research, a dataset of 186 users with over four million app records was used. While the LSTM outperformed the MLP in all tests, and was particularly suited for working on a reduced feature subset, the best performing LSTM reached an accuracy of 0.46. For the cold-start problem, an accuracy of 0.41 could be reached.
Thomas Willekes (Spring 2022) pdf
Keywords: Predicting tennis outcomes, Public Dataset

Abstract: Forecasting sports outcomes is a fundamental occupation of all that follow and/or analyse a sport. Tennis is no exception in this regard. However, machine learning/deep learning is relatively novel in this domain. Due to this novelty, a strong benchmark concerning features, algorithms, and time period seems to lack. Hence, we ask ourselves the question how various machine learning/deep learning algorithms utilize point statistics, player rankings, contextual information, and betting odds when predicting ATP matches. The performance is compared to the state-of-the-art Elo model and a simplified bookmaker consensus model. The construction of the features is enabled because of the data sources from Jeff Sackmann and tennis-data.co.uk. Through the unique combination of the out-of-sample time period, the number of features used, and the number of algorithms used, this thesis contributes by constructing a structural review of all important dimensions in this domain. It is found that the multilayer perceptron has the best overall performance, with the betting odds as the main driver of this performance. Furthermore, the random forest closely follows the multilayer perceptron in terms of performance and performs the most consistent when ranking-based subsets are investigated. Overall, the state-of-the-art statistical baseline is beaten by all trained models, indicating a strong argument for using machine learning in this domain.
Robin van Heesch (Fall 2021) pdf
Keywords: Sequence Mining Techniques, Predicting purchase intention, Public Data

Abstract: In the past two years, COVID-19 has been the antecedent for different developments. The growth of e-commerce is one of those developments. E-commerce businesses need to understand their customers’ behavior in order to have competitive advantages and increase conversion rate. This study focuses on the prediction of a customer’s e-commerce purchase. Most research regarding the prediction of e-commerce purchase concern the creation of recommendation systems. This study uses the knowledge of customer behavior to predict whether a customer will commit to a purchase. These predictions were accomplished by using sequential pattern mining (SPM) techniques combined with machine learning algorithms. Using SPM algorithms PrefixSpan, Closed Sequential Patterns (ClaSP), and Vertical Mining of Maximal Sequential Patterns (VMSP), several frequent sequences were extracted using data from an online electronic store. Those frequent sequences will be the features for the machine learning algorithms. Extreme Gradient Boosting (XGBoost) and K-nearest neighbor (KNN) were used to predict whether a customer will purchase. This study compared those two models with three different feature subsets. Namely, the top five frequent sequences from PrefixSpan, ClaSP and VMSP. Based on F1-scores, four out of six models outperformed the baseline model. Overall, XGBoost performed best. The features that were obtained from the ClaSP algorithm resulted in the best overall performance. The machine learning algorithms with features subtracted from the VMSP algorithm resulted in the worst overall performance.
Tessa Roes (Fall 2021) pdf
Keywords: Recommender systems, Predicting song play count, Public Data

Abstract: Recommender systems are extensively used to recommend songs to users. In this research, a hybrid two-stage recommender system for Spotify is constructed. The dataset contains users, songs, song features and play counts. To generate recommendations, clusters of similar songs are generated. Within the cluster, similar songs are found. A user’s rating for a certain song is predicted by the rating of similar songs. Subsequently, songs with a high prediction value are recommended (Ahuja, Solanki & Nayyar, 2019). This thesis researches to what extent feature engineering techniques affect cluster similarity, as well as the performance of recommendation systems. To do this, a number of feature engineering techniques are used before clustering. The feature engineering methods that are discussed are feature selection, dimensionality reduction and missing data imputation. This study concludes that feature selection and dimensionality reduction improve cluster similarity, as well as model performance. In contrast, missing data imputation leads to lower cluster similarity. The effect of data imputation on performance cannot be determined with certainty, since the test sets are dissimilar. However, data imputation is still preferred since it improves the naïve baseline to a greater extent than the degree to which the standard model outperformed its naïve baseline. Further, the method correctly deals with missing data. This results in higher robustness among results.
Floris Zanders (Fall 2021) pdf
Keywords: Aalbers phone use dataset, Predicting procrastination with phone use features

Abstract:
Lotte van der Klei (Spring 2021) pdf
Keywords: Aalbers phone use dataset, Predicting personality traits with phone use features

Abstract: Large amounts of data are being collected everyday with the usage of smartphones. The passive collection of smartphone leads to new opportunities in research in human behavior, where the active participation of humans is not required. Prior research has stated the importance of the big five personalities in several research fields, however little research was done on objectively quantifiable behavior of individuals. In this research will be investigated to what extent the big five personalities can be predicted based on smartphone usage. Prior research found several features that could be of influence on predicting the five personalities. In this research these features are combined with several extracted features sets when predicting personality. The dataset that has been used for this research contains data on 221 participants including their personality scores and extracted features from their phone usage in a period of five months. The features that have been extracted from the raw data set were combinations of mainly: spatial features, communication application features, features from categorized applications, and features on notifications, time of smartphone usage and battery percentage. Random forest, logistic regression and support vector machine models have been tested on the combination of these feature sets. The best performing machine learning models per personality slightly outperformed their baseline model. The best performing model for the personalities that could be predicted above baseline was the random forest model. The best predicted personality in comparison to the baseline was openness. Also, agreeableness, conscientiousness and neuroticism outperformed their baseline models. Feature importances were extracted to create more explainability and interpretability in models that were used for this research. Actual correlations of the important features could not be found in this research.
Sjors van den Boomen (Spring 2021) pdf
Keywords: Predicting parcel delivery time, Proprietary Dataset

Abstract: The goal for this thesis was to find how SES data might influence parcel delivery time prediction. A dataset from a bicycle company in Eindhoven (TDV) was used to build a baseline simple regression model to predict parcel delivery time. This dataset was then coupled to a dataset from Statistics Netherlands (CBS) that contained several categories of SES data. From this data regression models were built per SES data category. The algorithms used for these models were Linear Regression and regularized regression models Lasso Regression, Ridge Regression and ElasticNet Regression. This thesis has not found significant results that indicate that SES data has a meaningful contribution to the prediction of parcel delivery time.
Ramya Ramachandran (Spring 2021) pdf
Keywords: Aalbers phone use dataset, Predicting procrastination with phone use features

Abstract: Procrastination has a growing effect in the society. Previous studies have shown the costs of procrastination as a behavioral trait and benefits of using procrastination as self-regulatory strategy. This study aims to find how far variables from different data collection methods could predict trait and momentary procrastination. The two methods of data collection assessed in this study are the one-shot questionnaire method called “Survey Method” measuring trait behaviors and the other is “Experience Sampling Method” assessing behavior repeatedly over time. Four machine learning algorithms are used to build the best predictive model. These are Linear Regression, K-Nearest Neighbors, Random Forest and XGB Regressor. The analyses have showed that combing trait behavior features with momentary behavior traits using XGB Regressor predicts momentary procrastination 23% better than the baseline model. This study highlights the costs and benefits of using these models.
Stephan Krijger (Spring 2021) pdf
Keywords: Predicting startup company success, Open Dataset

Abstract: This thesis has focused on predicting startup status for different categories of startups based on publicly available data. A startup can have four types of status: operating, acquired, initial public offering (IPO), or closed. The research question was “To what extent can startup status be predicted for different categories of startups, based on publicly available data?” This study featured two clustering algorithms to decide the optimal number of categories of startups, seven models for predicting startup status, and six methods for overcoming class imbalance as the target class features a heavy class imbalance. The data comes from the startup database website Crunchbase.com and houses information about startups from 2000-2015. Features from current literature, where possible, were put into the model. Four startup categories were studied: Information Technology, Health Care, Consumer Discretionary & Finance. The best F1-scores per startup category were respectively: 40.7%, 39.2%, 45.4% and 38.5%. The best machine learning models per startup category were all tree-based classifiers, but they differed per category. They all outperformed the baseline that did not make use of oversampling techniques. The best methods for oversampling the minority class were SMOTE-ENN for one startup category and ProWSyn for three startup categories. To assess how well the models are at future prediction, the best models and methods for oversampling were then tested on a newer dataset of 2016-2019. Results for Information Technology, Health Care, Consumer Discretionary & Finance were: 35.6%, 40.8%, 31.5% and 33.3% on the F1 score respectively. This thesis highlights the differences between startup categories and the performance of tree-based classifiers on this problem. It also shows how difficult it still is for Machine Learning models to predict on multi-class imbalanced datasets. Although results were better than the baseline, more work should be done in order to give venture capitalists or startup founders an accurate prediction of their future startup status.
Laura Kooijman (Fall 2020) pdf
Abstract: In professional cycling training schedules are optimized to perfection. But to know on beforehand if the training schedule has the desired effect, there is the need to know what effect the training had on race performance. In this research a logistic regression, a support vector machine and a random forest are developed to predict race performance of a professional female cyclist, based on training load. The data consists of the races and training of 2017-2019. The research question that will be answered is: To what extent can race performance be predicted in cycling, based on training load?

Athlete data is often limited in size as athletes only can do a number of races per year which makes the data impractical for predictive modelling. This study investigates which techniques are helpful in classifying race performance. Class balancing is performed using weight adjustment and the SMOTE technique. In addition to that, PCA is performed. The random forest with weight adjustment gave the best result with a F1-score of 0.88, which shows that it is possible to predict race performance with a small dataset. The PCA showed an improvement in prediction for the SVM with an F1-score of 0.872, which is an improvement but not as high as the random forest. This means that the PCA was not beneficial for this dataset.
Loes Modderman (Fall 2020) pdf
Abstract: Amusement parks deal with crowdedness and managing this crowd almost every day. This crowdedness may cause waiting times in general to get higher which in turn may cause dissatisfied guests (Furnham, Treglown, and Horne 2020). The research field of managing flow and capacity in amusement parks has been studying for several years how to control the crowds using several methods mostly by running simulations (Ahmadi 1997; Cheng et al. 2013; Zhang, Li, and Su 2017; Yuan and Zheng 2018). This study researches the effect of three different crowd management methods on the waiting time of attractions and the crowd distribution in amusement park the Efteling. The three methods are the placement of physical signing across the park, the sending out of push notifications containing information and tips about crowdedness, and a recommendation app for a phone to recommend attractions and restaurants. This research will analyse these effects using data collected from real life. The data that is used for this study is provided by amusement park the Efteling. The results show that none of the crowd management methods have an effect on the waiting times nor the crowd distribution. However, these results may be inaccurate and can be improved by further optimising the prediction models that are used to compute the results.
Jeroen Simonse (Fall 2020) pdf
Abstract: With the already 30000 unique game titles that are available on the Steam platform, and developers releasing more games every year, the average gamer might be overwhelmed by the abundance of game titles. To make sure the users of the platform don’t get lost in the game store, there are systems working on the background, that make sure the users only gets to see relevant games. These systems are called recommender systems, and they try to recommend games to the user, based on the users’ past interests. The two most popular methods for recommending items to users, are the collaborative filtering method and the content based filtering method. This study focuses on these two methods, and tries to determine which of the two methods provides the best recommendations for the users of Steam. The data used for this study contained information about the what games each user owned, how long each user played a game and the characteristics of each game. First the collaborative model was constructed, which combines the individual interests with the opinions of other users to predict recommendations. Then the content based model was constructed, which focuses more on the contents and the characteristics of a game. The results of this study showed that the collaborative filtering method was superior to the content based method, which corresponds with the research that already has been conducted on this topic.
Mehmet Turgut (Fall 2020) pdf
Abstract: For this thesis, the predictive performances of Machine Learning and Deep Learning methods have been researched for predicting Mixed Martial Arts matches. The goal of this study was to answer the research question What is the difference in the prediction performance that can be achieved by DL models compared to traditional ML models by predicting MMA matches? During this study, a Random Forest and an Artificial Neural Network were trained on data from the past 22 years. The data is made available by the Ultimate Fighting Championship.

Two data sets were scraped from www.ufcstats.com. These were then combined into a single data set. During the feature engineering process, great emphasis was put on preventing information leakage from occurring. Using random search algorithms for hyperparameter tuning, the Random Forest and Neural Network were able to achieve test set accuracies of 58.98% and 59.11% respectively. These accuracies are in-line with the results of other similar studies focusing on sports prediction. So it is hard to say if the sport of Mixed Martial Arts is more or less predictable compared to other sports that have commonly been researched.

However, information leakage is not always taken into account while building models for sport prediction. An additional set of models were trained to find out what the effect would be if no precautions were taken to prevent information leakage. The Random Forest and Neural Network models with information leakage achieved test set accuracies of 65.11% and 68.59% respectively.

The results of this thesis highlight the predictive performance of Random Forests and Neural Networks with regards to Mixed Martial Arts predictions. In addition, the results also highlight the effects of information leakage, not just in sports prediction, but in all of Machine Learning. Insufficient measures to prevent information leakage can lead to too optimistic results, which are not realistically achievable in real-life settings once a model is deployed.
Carien Dijkhof (Spring 2020) pdf
Abstract: In a society where young adults comprehensively use smartphones and social media, the effects of this usage are more being researched due to negative effects on well-being. This study focuses on the influence of social media usage on mental tiredness and forecasting mental tiredness among young Dutch adults by using binary classification models.

Most research done regarding the effects of social media on the mental state are from a social science or experimental origin. This research will try to predict mental tiredness based on social media usage with the use of machine learning tools. With exploratory data analysis and feature selection tools, relevant features are selected for this classification problem.

Using classification algorithms Logistic Regression, K-nearest Neighbour, Random Forest, and Support Vector Machines, this research will compare 4 main models with different subsets of data and features. The algorithms are tested on data derived from two datasets containing phone use data and self-reported mood data.

The results show an inconsistent trend against the baseline. KNN and Logistic Regression showed no clear improvement than the baseline. In general, Random Forest and SVM performed better than the baseline approaches, with Random Forest showing on average the best performance in terms of accuracy among the 4 different classification algorithms. The highest accuracy achieved was by SVM model.

The results provide new models for detecting mental tiredness among young Dutch adults, which can be used in future research. For future research, adding additional meaningful features to these models potentially improve the performance of the classification
Andrea Favia (Spring 2020) pdf
Abstract: Interpretability in models which deal with human language is a field rich in questions and potential, and cutting-edge research brings innovation to the tools used to understand how deep learning models reach an interesting level of capability in the classification, prediction and generation of language. The present thesis focuses on the processing of text data by means of two different transformer models, a POS-Tagger and a Language Model (henceforth LM), to investigate whether these go through similar learning phases when given the same data tagged on different levels of abstraction (namely syntax and lexicon). The intuition is that the models will have to undergo a similar learning process as the POS-Tagger one, since syntactic acquisition is a pre-requisite to the acquisition of higher levels of language, which has been researched in the field of deep learning by taking inspiration from language acquisition.

The models have been trained on the WikiText dataset, which is a widely employed dataset in language modeling and POS tagging tasks, often used for benchmarking models.

In order to investigate how the models learn, and whether they might follow similar learning patterns, the SVCCA and CKA algorithms have been applied to measure and compare layers similarity within and between the models, as well as the same layers across epochs during training. A second goal of this thesis was to determine if the two methods yielded comparable results. These algorithms have successfully allowed to find significant similarities between the first two layers of the models, and also gain insights into the LM structure, suggesting that the two sets of layers should share similar information and have learned similar features. Finally, the results of SVCCA and CKA are shown, and a case is made on which algorithm may be more appropriate to use for certain analyses.
Ion Iuncu (Spring 2020) pdf
Abstract:
Joeri van de Rijdt (Spring 2020) pdf
Abstract: The purpose of this study is to predict the correct social media usage class of individuals. The basis for these classes is the duration that individuals spend on social media platforms. Different machine learning algorithms are utilized to address this problem. The data at hand consists of phone tracking data and mood survey data. The main question to be answered is to what extent machine learning algorithms can predict these social media classes, using a combination of the two mentioned data types. This problem as well as the combination of phone and mood features have not been studied in literature before. The metrics to evaluate the performance of the models are accuracy and recall. As it turns out, all models outperform their benchmark when it comes to accuracy. With regard to recall, only some of the models outperform their benchmark. Recall is a more important metric than accuracy to predict problematic social media usage. Overall, the predictive value of the machine learning algorithms is not large enough to have an impact on businesses. There are several opportunities for future research.
Marieke Roost (Spring 2020) pdf
Abstract: People seem to use their smartphone more intensively every year. Excessive use of smartphones influences people’s mood, mental health, and well-being in a negative way. This excessive use is becoming a big problem as people are experiencing diculties because of this in daily life. Previous research has studied the relationship between mood and smartphone use by analyzing one or a few negative emotions. This present research uses a representational similarity analysis, a multivariate analysis method, to study this relationship with a broader range of moods and smartphone behavior features. The results of this research show weak correlations between similarity in smartphone use and similarity in mood in general with this analysis method and data. Furthermore, a small di↵erence is found between positive and negative moods for smartphone behavior. Also, this study shows that the duration of smartphone use and the frequency of smartphone use are both useful measures to explain smartphone use. The results suggest that this present research might not provide enough information to state that there is a strong relationship between smartphone use and mood using RSA and this type of data.
Maartje Verhoeven (Spring 2020) pdf
Abstract: Insight into how smartphone usage affects mood is important in order to enable people to use their smartphones in a manner that improves rather than deteriorates their wellbeing as an increasing amount of people is struggling with their smartphone usage. This study investigates the extent to which smartphone application usage can predict mood among blocks of measurement in panel studies. Panel conditioning and panel attrition have been widely discussed in the social sciences to affect the quality of the results but this has, to our knowledge, never been taken into account for predictive models in the field of data science. Data from a population of 124 first year Psychology students at Tilburg University, measured in period of 34 days, were used to train and tune several learning algorithms and compare models from different blocks of measurements. Results indicate the Random Forest (RF) classifier to best predict mood from application usage and the model containing data from the first half of the study to score highest in comparison to the other defined models. However, the achieved accuracy scores were only slightly above the baseline and the predictive performance is therefore considered to be low. It is recommended for future research to use more frequent mood measurements as it was not possible to capture the experienced mood at the moment that the smartphone was used with the limited measurements from this study.
Fenna Bronwasser (Fall 2019) pdf
Abstract: Typographic errors, although being minor incidence, can influence the writing process by breaking the linear writing flow. Classifying typographic revisions could function as a first step towards further analysis on reducing the undesired effects of typographic errors, or for the purpose of filtering typographic revision; as typographic errors are non-deliberate, mechanical errors which you might not want to include when analyzing other types of revisions. This study is a continuation of the research of Conijn, Zaanen, van Leijten, & Van Waes, 2019. Whereas previous studies on typographic/typing errors focused mainly on the finished writing product, this study utilizes a process-based approach which allows for the exploration of new features. In contrast to the research of Conijn et. al., (2019), this research focusses on typographic revision instead of typographic errors, trains the model on writing tasks which have a more natural setting, compares the typographic revision classification between first language writers and second language writers, and uses fluency-based features for building a classification model. Results show that the fluency-based model was reasonably effective in classifying typographic revision. A difference in performance of the model between first and second languages writers was found, however it is unclear whether the dissimilarity in language and language acquaintance accounts for this performance difference.
Gary van Koeverden (Fall 2019) pdf
Abstract: Stress has a growing effect on society, predicting stress through smartphone usage seems a costeffective and convenient method to measure stress. This study investigates the influence of different phone usage features on perceived stress levels and tests if these features can identify perceived stress levels. Five different models are tested, both user-specific and generic models. Three different classification algorithms were developed: Random Forest, Support Vector Machine and k-Nearest Neighbours. The data consisted of two different sets, one dataset that consisted of returned mental health surveys from a group of respondents and the other dataset was the phone usage log data of the same group. This data was merged and the aim of this study was to predict stress from small time frames of maximum two hours of phone usage data upon every returned survey. First an exploratory analysis was done on the different models to test which features have the strongest relation with the target stress levels. Afterwards the five models were tested with the classification algorithms. The classification results indicate that the classification algorithms do not perform better on the user-specific models as predictor for stress than on generic models. The different classification algorithms and models show very dissimilar results and predict in general not better than the baseline
Bram de Kroon (Fall 2019) pdf
Abstract: Mobile phone technologies have developed rapidly over the past few years. To be able to facilitate these continuously developing technologies, smartphones demand increasingly more battery capacity. Improving smartphone energy efficiency is an ongoing challenge and is being addressed from numerous perspectives. On another note, there is increasing interest in understanding how smartphone users use their phone. This work is an initial attempt to determine how smartphone users adapt their phone usage behavior to the battery level of their phone. Answers to this matter might prove to be relevant for research regarding smartphone energy efficiency as well as the understanding of smartphone usage itself. Six features have been evaluated which all represented a concept of phone usage behavior and quantified to what extent a smartphone user adapts the respective concept of phone usage behavior given two intervals of battery level. Results suggest that distinct patterns of change in phone usage behavior cannot be accurately captured using global intervals of battery level. Instead, they suggest that we should look closely to how smartphone users adapt their phone usage behavior to a more continuous scale of battery level.
Siyou Liu (Fall 2019) pdf
Abstract: The goal of this research is to examine what method can be used to transform the Likert scores of different emotions into one emotional state indicator, and how accurately can Random Forest Classifier be used to classify people’s emotional state based on their app usage behavior and app category. The research question is: How accurately can people’s negative emotional state be classified by their app usage behavior and app category? Whilst much previous research investigated the association between people’s phone usage and emotion, this research sets out to examine the joint effect of app usage like duration, frequency, earliest usage time. etc., together with six different types of app categories, which provides deeper insights into the app usage behavior. The app usage dataset used in this research was generated by software, which is more reliable compared to self-reported app usage activities manually filled in by the users. Prior to the classification, this research also used a dataset that contains eight different negative scores, measured on a five-point Likert scale, to create the target variable. To be able to classify emotional state instead of discrete emotion, the Likert scores of these eight variables were transformed into one emotional indicator using k-means clustering and principal component analysis, and resampling method as well as feature selection technique based on feature importance was used for further improving the model accuracy. By the end of the research, an accuracy of 90% was achieved.
Emma Janssen (Fall 2019) pdf
Abstract: There has been considerable interest in the recognition of activities in day-to-day tasks. The coordination of movements and gaze play an important role in this process. With the development of wearable cameras, eye movements can be analysed from egocentric view when performing activities of daily life (ADL). More research on this subject could result in more knowledge on activity recognition which could contribute to research in the practical field (healthcare). This study aims to predict to what extent an activity is performed in ADLs, based on eye movements from egocentric view. This is done by the annotation and analysis of wearable videos from six participants while performing the activity of tea-making, which consisted of several smaller actions (e.g. pouring water, finding cup etc.). The Random Forest technique was used in order to find an answer to our research question. Analysis showed that there were no major differences in performance between models. However, the models performed better than the majority baseline score. It could be concluded that the models used in this study add value in predicting activities. In addition, analysis of the performance of actions separately was conducted. Analysis showed a difference between in the predictability of actions.
Roderick Korthals (Fall 2019) pdf
Abstract: The main goal of this study is to examine to what extent daily stress and anxiety levels can be predicted by analyzing smartphone usage data. In the literature, it became clear that smartphone use is linked to stress and anxiety, and predictive modeling has shown the potential to utilize smartphone data to successfully predict mood. Therefore, a generic and group-personalized model has been used to perform prediction tasks 1 and 2. The first prediction task examined to what extent daily stress and anxiety levels of smartphone users can be predicted by analyzing smartphone usage data. The second prediction task examined to what extent daily stress and anxiety levels can be predicted by analyzing smartphone usage data and the context when using a smartphone. Three machine learning algorithms were applied, namely decision tree, logistic regression, a support vector machine, and random forest. Overall, the models performed poorly for predicting stress and anxiety, and the results showed that the models performed better for predicting anxiety than stress. The random forest was the only model that had a moderate performance in the generic and group-personalized model. Adding external factors improved the prediction performance of the models. Moreover, the group-personalized model did improve the prediction task. The percentage of notifications and the number of sessions were the most important features in the generic model to predict anxiety. There were no crucial features identified in the generic model to predict stress. Finally, in the group-personalized models, the daily use of other (not defined) applications was of most importance when predicting stress. Daily use of Social Media, daily use of other applications, and daily use of messaging apps were the most important features to predict anxiety, although their value was limited. Since this research showed that group-personalized models had limited value in the prediction task, further research should use personalized models to predict mood. Besides, neural networks could be used, which seem to be more suitable to the prediction task.
Marlijn Y. Moonen (Fall 2019) pdf
Abstract: The main goal of this study was to evaluate conventional classification algorithms at predicting a mobile phone users’ energy level by their phone activities. Before evaluating the classifiers on the data, this study aimed to examine whether the predefined target variable ‘energy level’, which consisted of six classes, was constructed in a reliable way. Subsequently we found that there were no significant differences between some classes, hence the original six classes were merged into four significantly different classes. As the reordered target variable has four classes, this study is concerned with a multi-class classification problem. Additionally, the target variable is merged into two classes, thereby making it a binary target variable. As such we were able to evaluate the classifiers on both multi-class and binary classification problems and thereafter compare the results. Four classifiers are used on these classification problems, namely k-Nearest Neighbor, Support Vector Machine, Random Forest and Logistic regression. The Random Forest classifier is the best performing classifier (0.521) of the multi-class classification task and the k Nearest Neighbor classifier (0.702) for the binary classification task. Furthermore, this study examined to what extent feature importance affects the models. Most of the models were rarely affected by the removal of features and still managed to perform well.
Jorina Scherff (Fall 2019) pdf
Abstract: Due to the expanding number of mobile phone applications, consumers have a complex decision process of which app to use. By making predictions of which application someone will use next based on past behavior, a lot of research has tried to make automated recommendations that help the user simplify the process. However, app owners can use such prediction models as well, by deciding when they can send an advertisement banner for their app. Instead of predicting the next app as accurately as possible by using as much information as possible, this model will predict the next app category by using only data that is accessible for companies. This has, to our knowledge, not been investigated earlier, so the research question of this thesis was as follows: To what extent can the phone application category that someone will open be predicted, while using only mobile phone data that is accessible for companies? A dataset is used containing mobile phone usage data with categories assigned to the apps. Different classification models are compared and our findings demonstrate that Support Vector Machine worked best with the features previous app opened, notification, hour of day, and duration of previous app. However, there was a large difference in the recall values of the different categories, mostly caused by the difference in the amount of presence in the dataset. Therefore, it depends on the popularity of the app category how useful this model is.
Olga Vieru (Fall 2019) pdf
Abstract: This research aims to explore whether we can predict and recommend information most relevant to individual users, based on similar users in the same demographic category. Existing userbased collaborative filtering algorithms are applied, in order to produce personalized content recommendations based on group navigation patterns (page clicks). As a result, memory-based (neighborhood) approaches and model-based (matrix factorization) techniques are tested out to compare performance and results. The final dataset used for modelling is encoded in a 25x45 useritem feedback rating matrix. Among several algorithms that are tried out, our tuned singular value decomposition (SVD) model has the best performance and accuracy with RMSE=0.24, MAE=0.16, compared to a chance-level performance of RMSE=1.14, MAE=0.80. Generated output includes ten most relevant URLs per user group, as well as five new predicted links, that users might find interesting. Several domain specific findings are discussed further on in this report. Moreover, an approach measuring user-item predicted interest is presented, in order to quantify each user group’s preference for a URL, based on the deviation from their overall mean rating. In conclusion, this thesis contributes a low-resource data collection method with Google Analytics, which could be used to inform decision-making in both commercial and noncommercial settings, and translated to other domains.
Sophie Vink (Fall 2019) pdf
Abstract:
Aaron Wijnker (Fall 2019) pdf
Abstract: Stress levels seem to have risen the past years. More people are claim to feel longer periods of stress. This can have negative health effects. Prediction of stress is important for stress detection, treatment and the prevention of chronic stress. Phone use has also increased worldwide. Phones play a big role in our everyday lives, which has led researchers to believe patterns in phone usage could identify people’s emotions and personality. The present study uses these phone usage patterns to predict stress. Different models have already been built to obtain information from phone usage data, focusing on app frequency. Previous research suggested that the order of used apps could present additional information for stress prediction. The results of the present study showed that, while both non-sequential frequency and sequential patterns are able to predict stress better than the majority baseline, the non-sequential patterns were more useful for stress prediction. This suggests that sequential patterns might not provide additional information for stress prediction. The results also provide a new exciting stress prediction method, which could be combined with existing methods.
Catherine Schwitzer (Fall 2018) pdf
Abstract: The goal of this research was to determine if clustering mobile phone data can be used to segment users into groups based on their behavior. Previous studies have attempted to profile users according to mobile phone behavior, but they pre-determined the qualities of the profiles manually as opposed to clustering. Studies that did utilize clustering for mobile phone analysis primarily focused on predicting the next app users would open. This research uses logging data from the MobileDNA app from Ghent University to create standard phone behavior features, as well as new ones that quantify notification response time. Principal component analysis was conducted on the feature set before clustering using DBSCAN. The clustering results assigned most users into one main clusters, which suggests that clustering may not be the most appropriate method for user profiling and that users are better considered with regard to a spectrum of behavior. Additionally, it found notification response time is an important feature in differentiating users and should be included in future studies.
Eefje de Louw (Fall 2018) pdf
Abstract: This research examines the effect of Conversational Human Voice on User Experience in chatbots in the context of survey research performed by the municipality of ’s-Hertogenbosch (The Netherlands). Literature of suggests that Conversational Human Voice has a positive effect on User Experience, but that never has been tested. Therefore, this study proposes the following first research question: What is the effect of Conversational Human Voice on the User Experience? In the research of Human-Computer Interaction, it is known that humans are likely to attribute human characteristics to the computer when they interact with them and show similarities to Human-Human interaction. It is said that people tend to adapt their language use to that of their conversational agent. Therefore, this study proposes the second research question: To what extent do users alter their language use to the addition of elements of Conversational Human Voice of the chatbot? The results of 551 participants were analysed and the following conclusions are drawn: Conversational Human Voice does not necessarily lead to a higher User Experience score in the context of survey research. For the context of survey research, one can rather stick to the functional communication styles. Compared to the other Conversational Human Voice categories, inviting rhetoric seems most suitable in the context of survey research. Significant differences were found for the “helpfulness” and “clearness” of the chatbots. All in all, chatbots do show potential, but mainly in more bound contexts such as answering frequently asked questions or managing appointment.
Ronny Brouwers (Fall 2017) pdf
Abstract: This thesis project used experimental, cross-situational, word-learning data, and the correct combination of pseudo words and novel objects had to be identified in an ambiguous setting. The first research question of this project aimed to identify which individual features could be used to predict whether subjects would learn a word pair correctly in the testing phase. Based on the literature, five potential features were identified and tested using a logistic regression and random forest algorithm. The results showed that the more frequently a word-object pair was presented without uncertainty, the more likely the pair was to be learned correctly. The second research question focused on identifying different types of learners, as the literature showed that different subjects may learn differently. A Gaussian mixture model and hierarchical clustering were used, and clustering analysis showed that the identified clusters were poorly separated and contained much noise.
Yashi Thakkar (Spring 2017) pdf
Abstract: This thesis aims at predicting an image based on different texts describing that image and question asked while guessing the image. The main research question is to evaluate the self-sufficiency of text data to identify a portrait with the help of text mining and similarity measures. We used two types of vector space model combined with two types of decision rules and one similarity measure that is cosine similarity. In our research, term frequency and inverse document frequency performed the best with a precision of 49%. This shows how text itself can help reduce number of alternatives at the time of decision making. This seems very useful in the area where tasks related to finding suspects are concerned. Moreover, retaining few parts of speech can be helpful to increase the speed of mining as they retain a lot of information simultaneously reducing the noise.
Zeynep Oncu (Fall 2018) pdf
Abstract: This study is a comparative analysis of supervised learning tasks using categorical texts and free-text questions about facial details of people. The datasets used for this research were collected through online experiments. This study used pre-trained word representations, which are known to be perform better than the traditional text mining approaches. Categorical texts and word embeddings of question texts were used as features in classification algorithms and their performances were evaluated with accuracy rates in predicting answers to the questions. In other words, the research question was formed as “Which pretrained word embedding and classification models perform best in terms of accuracy rate in predicting answers to yes/no questions?” The findings showed that use of pre-trained word embedding models indeed led to better predictive performances in some classification algorithms compared to the baseline for this dataset.
Maarten Jansen pdf
Abstract: This work investigates culture and cultural influence of visual narrative structures in comic books, using a machine learning approach. It is motivated by proof that comics are susceptible to adjustments stimulated by culture, and the scarcity of a machine learning approach in studies on comics. Our main contribution is a better understanding concerning cross-cultural differences in visual language, and evolvement over time from a novel perspective. We find this contribution by asking the following research question: To what extent can we predict continent, country, and decade of publication based on narrative patterns in comic books published in the last eight decades? The answer to this question is obtained by training a decision tree and naive Bayes classifier, using narrative patterns extracted from the Visual Language Research Corpus. The results demonstrate that there are cultural distinctions in patterns if we examine continent or country of publication. However, if we examine decade of publication we witness fewer characteristics pointing towards changes over time, unless the time period investigated is of sufficient length, for example 50 years.