Data
Analytics, Science, and Modeling
Most of my data related work is closed-source, however I’ve put together a few older projects that were either open source from the get-go or allowed to be open-source after request.
- airbnb visualization in D3 An interactive D3.js map of Boston AirBnB listings using GeoJSON; the interesting part is the Python pipeline that converts raw listing data to geospatial format.
- digging into my discord using data analysis Regex-based analysis of 200,000+ messages from a Discord server — frequency patterns, activity windows, and communication graphs.
- viral networks - khan academy interview project Implements graph-based infection spreading (total and limited strategies) plus a Markov chain analysis of the network, from a 2017 interview project.
- interpolating 3d
Uses scipy’s
griddataand kriging to interpolate 3D glacier coordinate datasets — a practical comparison of interpolation methods in three dimensions. - fuzzy document retreival using high performance latent semantic analysis High-performance latent semantic analysis for fuzzy document retrieval: SVD on term-document matrices with a Python interface for querying by semantic similarity.
- twitter civil unrest analysis with apache spark A 2015 prototype using Apache Spark’s streaming API to analyze civil unrest signals in real-time Twitter data; the functional pipeline patterns hold up better than the specific APIs.
- solving political boundaries through simulation Applies simulated annealing and genetic algorithms to redistricting — framing fair district boundary selection as a computational geometry optimization problem.