Behavior of Chain-of-thought Monitorability
Studying how monitor effectiveness changes as the capability gap between monitor and target models widens, with a case study on distinguishing sandbagging from genuine incapability.
A collection of projects I've worked on, ranging from academic research to personal explorations. If you think it would be useful or interesting to collaborate on a project, please contact me to discuss.
Studying how monitor effectiveness changes as the capability gap between monitor and target models widens, with a case study on distinguishing sandbagging from genuine incapability.
Praxis Project Trial. ICM (unsupervised) elicits human concepts from base language models by maximizing mutual predictability and local consistency among concept-related examples.
Minimalist Llama2 implementation using PyTorch, exploring LLM components like RoPE, self-attention, and AdamW optimizer.
Jekyll-based static site from scratch for SEACrowd, grass-root org for Southeast Asian AI research.
Event management platform connecting sporty people locally. CodePath Advanced Webdev Full-Stack Project.
Full-stack Progressive Web App for seizure tracking with predictive warnings using XGBoost and LSTM models, built with React and Flask.
AI-powered personal knowledge assistant using LlamaIndex and OpenAI API to query Obsidian knowledge base with RAG architecture.
Replication study of Philadelphia's excise tax impact on beverages using synthetic control methods in R with causal inference.
Analyze GP visit patterns using Zero-Inflated Poisson models with complete and partial pooling, with Bayesian inference and data imputation.
Text classification system to automatically classify notes and assignments using SVM and Naive Bayes with 87% accuracy.
My GitHub ↗ contributions over the past year. Colored squares represent days with commits.