Through dimensionality reduction and clustering we can reliably group players into positions accurately as shown by our ARI and silhouette scores based on players’ similarities and differences with each other. Our best ARI score, 0.498, from one of our pipelines with UMAP and GMM indicates that there is a fair amount of similarity between the predicted clusters and true clusters, which shows that the statistics we used in our data allowed us to make fairly accurate predictions. Additionally, the best silhouette score, 0.62, also came from the same pipeline that had the best ARI score. This silhouette score indicates that the clusters made by this pipeline are somewhat dense and well separated from each other, which aids the performance of the Gaussian Mixture Model. 

Predicting FIFA players’ positions using dimensionality reduction and clustering techniques is a real-world application with significant practical implications in the field of soccer analytics. By accurately grouping players into positions based on their similarities and differences in playing styles, teams and coaches can make more informed decisions in player selection, team formation, and tactical strategies. For instance, clustering players into positions such as defenders, midfielders, and forwards allows coaches to identify the most suitable players for each position, optimize team balance, and devise effective game plans tailored to their team’s strengths and weaknesses. Additionally, understanding player positioning and role assignments can facilitate talent scouting, player development, and recruitment strategies, helping teams build well-balanced and competitive squads. Overall, applying clustering techniques to FIFA player data enables teams to gain valuable insights into player roles and formations, ultimately enhancing team performance and competitiveness on the virtual pitch.

The analysis of various predictive models applied to NBA player statistics reveals nuanced performance patterns across different metrics. For points per game (PPG) predictions, simpler models, such as linear regression, surprisingly outperformed more complex neural network architectures, showcasing a potential trade-off between model complexity and predictive accuracy. The best-performing neural network model for PPG, with three layers, achieved a mean squared error (MSE) of 0.020002 and an R-squared value of 0.999528. Conversely, predictions for assists per game (APG) exhibited more variability, with neural network models demonstrating superior performance over the linear regression baseline, albeit with varying levels of success across architectures. The best-performing APG model, also with three layers, achieved an MSE of 0.998866. Interestingly, rebounds per game (RPG) predictions mirrored PPG outcomes, with linear regression models again showing competitive performance. The best-performing RPG model, utilizing three layers, achieved an MSE of 0.007080 and an R-squared value of 0.998631. Additionally, when predicting player positions, models exhibited improved accuracy in the 2022-2023 season compared to the previous year, suggesting potential advancements in model refinement or data quality. The best-performing model for player position prediction achieved an accuracy of 72%. Overall, these findings underscore the importance of carefully selecting model architectures and optimization techniques tailored to the specific predictive task, highlighting avenues for further exploration and refinement in sports analytics.

The application of predictive modeling in NBA player statistics serves as a real-world example of how advanced analytics can inform decision-making in professional sports management. By accurately forecasting player performance metrics such as points, assists, and rebounds, teams can make informed decisions regarding player recruitment, contract negotiations, and game strategy. These predictive models enable teams to optimize player selection and allocation of resources, ultimately enhancing team competitiveness and performance on the court. Additionally, the insights gained from these models can also benefit fans, analysts, and sports commentators by providing deeper insights into player performance trends and game outcomes.

COGS 137

December 2023

Predicting Air Pollution Levels

Successfully developed a predictive model in R, for forecasting annual average air pollution concentrations in US zip code regions, achieving high accuracy with a ROC AUC score of 1, indicating impeccable discrimination between the predicted and actual values. This robust performance is further supported by the Root Mean Square Error (RMSE) of 1.716, denoting a relatively low average prediction error. Additionally, the R-squared (RSQ) value of 0.61 signifies that a substantial proportion of variance in the pollution levels is explained by the model, demonstrating its capability to capture and account for variability. Moreover, the Mean Absolute Error (MAE) of 1.159 highlights the model’s accuracy in approximating actual pollution levels at the zip code level.

The assessment of feature importance revealed intriguing insights, with ‘state’ emerging as the most influential predictor, followed closely by ‘CMAQ,’ both significantly more impactful than other variables in the model. This underscores the importance of regional differentiation, potentially tied to state-specific policies or geographically bound factors, in influencing air pollution levels. By leveraging these metrics, policymakers, environmental agencies, public health officials, and urban planners can gain valuable insights into localized air quality variations and prioritize interventions to mitigate the adverse effects of air pollution on public health and the environment. Additionally, businesses can utilize the predictive model to inform strategic decisions aimed at reducing environmental impact and promoting sustainability across their operations and supply chains. Overall, the project’s outcomes demonstrate the potential of predictive modeling to address the complex challenges posed by air pollution and pave the way for evidence-based policies and interventions to improve air quality and safeguard public well-being.

The research explored the potential of EEG as a reliable measure of emotional responses during both passive video clip viewing and interactive video game playing. While EEG demonstrated high accuracy in distinguishing emotional responses during video clip viewing, its performance significantly declined when applied to video game playing scenarios. This disparity highlights the complexities involved in capturing emotional engagement during interactive experiences and underscores the need for further research to optimize EEG methodologies for such contexts. Addressing limitations such as sample size, algorithm selection, and demographic factors, along with integrating additional physiological measures, could enhance the validity and applicability of EEG in assessing emotional responses across various media consumption scenarios.

The findings of this research hold relevance for multiple stakeholders, including researchers, clinicians, and media developers. For researchers, understanding the nuances of emotional responses in different media environments can contribute to advancements in neuroscience and psychology, facilitating a deeper understanding of human behavior and cognition. Clinicians could leverage EEG-based measures to assess emotional states in patients undergoing therapy or treatment, aiding in personalized interventions and monitoring progress. Media developers and content creators could benefit from insights into the emotional impact of their products, allowing for the creation of more engaging and impactful experiences tailored to audience preferences and emotional responses. Ultimately, the project’s vision involves refining EEG methodologies to provide valuable insights into emotional engagement across various interactive media platforms, ultimately enhancing user experiences and informing the development of more emotionally resonant content in both entertainment and therapeutic settings

The project aimed to enhance travel efficiency and customer satisfaction by predicting flight delays based on various factors like distance, origin, destination, and carrier. Despite the hypothesis favoring Linear SVC, the Random Forest model outperformed others with a 56% accuracy rate. However, all models struggled to surpass the 57% mark, indicating the complexity of predicting flight delays solely based on available dataset features. The limited dataset, focusing on three years of flight data from major cities, hindered the model’s ability to capture diverse influencing factors such as air traffic control, operational issues, and weather conditions.

The work is crucial for airlines, passengers, and airport authorities. For airlines, accurate delay prediction enables proactive measures to minimize disruptions, optimize resource allocation, and enhance customer service, leading to improved operational efficiency and reputation. Passengers benefit from reduced wait times and increased reliability, resulting in a smoother travel experience. Airport authorities can use such predictive models to streamline operations, allocate resources effectively, and enhance overall airport management. 

Conducted our own experiment and aimed to explore the emotional impact of social media content on individuals using brain-computer interfaces (BCIs) to measure brain responses using an EEG cap. While the initial results did not yield significant differences in emotional responses between happy, neutral, and sad stimuli, the study highlighted the potential of BCIs in understanding the neural correlates of emotional experiences on social media platforms. Challenges such as data collection limitations and pre-processing setbacks hindered further analysis, but the project provided valuable insights into the complexities of studying emotional responses to digital content. Moving forward, opportunities for improvement include collecting more diverse data, implementing machine learning algorithms for classification, and refining data analysis techniques to better understand the relationship between brain responses and social media content.

The work holds significance for researchers, social media platforms, and mental health professionals interested in understanding the impact of digital content on emotional well-being. For researchers, the project provides a foundation for further exploration into the neural mechanisms underlying emotional responses to social media, offering insights that could inform psychological and neuroscientific theories of human emotion. Social media platforms could benefit from understanding how different types of content elicit emotional responses, allowing for the optimization of user experiences and the development of strategies to mitigate negative emotional effects. Additionally, mental health professionals may use insights from this research to design interventions aimed at promoting positive emotional experiences and reducing the potential harm of social media use on mental health. Ultimately, the project’s vision involves leveraging BCIs to enhance our understanding of the complex interplay between digital content and human emotions, paving the way for the development of evidence-based interventions and strategies to improve emotional well-being in the digital age.

The project aimed to predict whether H-1B visa applications would be certified or denied based on various factors related to the sponsoring employer, such as size, location, industry, and sales volume, as well as job details like wage and title. Through extensive analysis and the application of machine learning models including Random Forest, Logistic Regression, SVM, and KNN, the study found that incorporating employer metrics significantly improved the accuracy of predictions compared to baseline models and previous research. Particularly, tree-based models like Random Forest and XGBoost exhibited high accuracy scores ranging from 90% to 98%, indicating the effectiveness of leveraging employer information in predicting H1B visa outcomes. Moreover, the project highlighted the importance of hyperparameter tuning, demonstrating that exhaustive exploration of model settings can further enhance prediction accuracy.

The work holds significance for various stakeholders involved in the H-1B visa application process, including applicants, sponsoring companies, immigration authorities, and policymakers. For applicants, accurate prediction of visa outcomes can provide valuable insights into their chances of success and enable informed decision-making regarding career opportunities in the United States. Sponsoring companies can utilize the predictive models to optimize their visa application strategies, potentially increasing their success rates and reducing administrative burdens. Immigration authorities can benefit from improved efficiency in processing visa applications, while policymakers may use the findings to inform policy decisions aimed at streamlining the visa application process and promoting economic growth through skilled migration. Overall, the project’s findings offer practical insights into the role of employer metrics in H-1B visa approval, paving the way for further research and applications in immigration policy, corporate strategy, and international talent management.

In our project focused on analyzing the impact of logo redesigns on technology companies’ stock market performance in the United States, we found evidence suggesting a positive correlation between minimalist logo redesigns and increases in the companies’ Adjusted Closing Price. By examining each company’s stock data before and after the logo redesigns, we observed an overall upward trend in their Adjusted Closing Price, indicating a potential boost in investor confidence and market perception following the logo changes. However, our analysis did not find a significant correlation between logo redesigns and Daily Return, suggesting that other factors may influence short-term stock market fluctuations beyond logo aesthetics.

The findings of our project have implications for various stakeholders, including investors, marketing professionals, and company executives. Investors can use the insights gained from our research to potentially identify trends in market response to logo redesigns and make informed decisions about their investment strategies. Marketing professionals can leverage this information to understand the impact of branding decisions on market perception and develop strategies to enhance brand equity through logo redesigns. Additionally, company executives can use our findings to inform their decision-making processes regarding branding initiatives and assess the potential impact on stock market performance. Overall, our project contributes valuable insights into the relationship between logo redesigns and stock market figures, offering practical implications for stakeholders seeking to understand and leverage the intersection of branding and financial performance in the technology sector.