07. Student projects

On this page we have listed a variety of student projects carried out by students at the University of Massachusetts Dartmouth. This is far from an exhaustive list and is intended to give you a feel for the types of applied statistical projects and research in which University of Massachusetts Dartmouth students engage.

Discrepancies in police killings in the US

The Washington Post newspaper keeps an up-to-date database of police killings in the US. Factors include location (latitude and longitude), gender, age, and race. Students have utilized techniques of spatial data analysis to analyze this data for discrepancies in relation to age and race. Additionally, a database of locations of US police stations allowed an analysis of the distribution of distance to the nearest police station for a given police killing.

Modeling Medicare costs

Modeling variations of Medicare costs for intracranial hemorrhages and cerebral Infarctions across the United States. This student project led to both an external conference paper presentation at the Virtual Student Research Symposium in Statistics and Data Science, 2020, Boston Chapter of the American Statistical Association, and consequent publication.

Success in a preliminary university year

Data was obtained from a New England university on students admitted to a preliminary year, prior to being admitted to a degree program. Administrators wanted to know which, if any, of many factors were predictive of student success in the preliminary year. Students built and analyzed logistic regression models and drew conclusions as to which factors were, and which were not, relevant to predicting success in the preliminary year.

Random geometric graph models for protein-protein interactions

Following on foundational work of Natasa Pržulj:

students obtained protein-protein interaction networks – which are fundamental in understanding modern biological cellular processes – from existing biological databases, and applied the Higham-Rašajski-Pržulj algorithm for fitting random geometric graphs to model the protein-protein interaction networks. Along the way, modifications and additions to the basic algorithm were considered.

Machine learning models for distribution of plant species on sand dunes

Students analyzed data from sand dunes on Waquoit Bay, Cape Cod, Massachusetts to try and explain factors influencing the distribution of plant species on the dunes. A linear support vector machine was utilized to ascertain if the growth of different plant species could be successfully modeled. Maximum wind speed and salt spray were found to be the most predictive factors. (Reference)

 

Using spatial point processes to examining potential redistricting solutions 

Political redistricting is usually carried out by potentially partisan human participants. However, researchers have devised algorithms to automatically perform redistricting without requiring further human judgment, and have also developed mathematical methods for measuring the severity of a gerrymander. This project provides a framework for evaluating redistricting algorithms, and relies on a spatially-distributed set of population points, rather than census blocks. This is performed by representing the population of a state as a spatial point process in two-dimensional space. By representing each person as a point in space, algorithms can carry out redistricting at an individual level rather than block level. (Reference)