I should mention that there’s a subtle bit of legwork needed to enable random seeding in sampling from pandas DataFrames across Python PoolWorker instances. But besides this, the below bootstrapping code is simple and follows pretty much directly from the definition. See Wasserman’s text All of Statistics for concise pseudocode and explanation. Here’s a Python… Continue reading Multi-process Bootstrapping for Pandas DataFrames Statistics
Category: Statistics
Bootstrapping Estimates for Comment Likelihood, Hacker News: EDA II
In my previous Hacker News EDA we looked at how words could be embedded in two dimensions. This time we implement a bootstrapping simulator for seeing the impact of posting time on number of comments received. Examining the dataset To get an idea of what keywords are popular at different times of the day, we… Continue reading Bootstrapping Estimates for Comment Likelihood, Hacker News: EDA II
Solve a Substitution Cipher with a Markov chain
There are k! substitution ciphers for an alphabet with k letters—too many for an exhaustive search. With a frequency-based approach adapted to the graph of alphabetic ciphers, we redefine the act of deciphering as a sampling problem suitable for a Metropolis-Hastings random walk. A substitution cipher is thus solvable with a Markov chain. Let’s begin… Continue reading Solve a Substitution Cipher with a Markov chain
Plant Pairs on the Tufts Campus
In Spring of 2019 my Environmental Fieldwork class surveilled the herbaceous plants growing on and around the Tufts campus, recording their identities and locations into a GIS database. For a final project I created a simple Cartesian quadrature algorithm in Python to identify the distinct plant pairs most likely to share the same soil. Whether… Continue reading Plant Pairs on the Tufts Campus