Data Scientist/ Program Evaluator
Experienced in applying advanced statistical methods and machine learning to identify systemic inefficiencies and maximize ROI for large organizations. Proven ability to lead cross-functional teams—including data scientists, accountants, and software engineers—to extract actionable insights from large-scale datasets, with results regularly informing decisions at the senior executive level, including the Board of Management.
- Leadership: Directed a team of data scientists, accountants, and software engineers, fostering collaboration to analyze hundreds of millions+ of records.
- Technical Expertise: Proficient in statistical analysis, machine learning, and data visualization to drive informed decision-making.
- Strategic Insights: Utilized data-driven strategies to identify inefficiencies, optimize processes, and enhance organizational performance.
- Negotiation Skills: Expert in negotiating project scope and deliverables with stakeholders to ensure alignment and successful outcomes.
- Consensus Building: Adept at gathering consensus among diverse teams and stakeholders to drive initiatives forward and achieve common goals.
🧠 My personal blog is live! → 📓 Follow my data journey—projects, insights, and lessons learned.
Work Experience
Data Scientist/Program Evaluator @ Canada Revenue Agency
- Led the execution of geospatial analysis on behavioral data to measure engagement and enhance service.
- Led the planning and execution of various program evaluations involving thousands of data elements. Led the development of tools that identified XXX million dollars of undetected revenues, leveraging custom graph algorithms and complex data pipelines.
- Improved search functionalities for stakeholders, achieving 100X efficiency gains; developed tools to summarize large-dimensional tables for enterprise SQL servers to enhance data accuracy and quality assurance.
Talks
I presented to a wide variety of audience, including analysts, managers, directors, director generals, and C-suite executives in public serivce from Canada and the United States. Below are a sample of data talks I gave:
- CRA Lunch and Learn
- Summarization of High-dimensional Tables on database servers (GitHub)
- Summarization of High-dimensional Tables on database servers, with RShiny Interface (GitHub)
- Create Organizational Maps and Search Tools with Webscraping and Semantic Search
- GC Data Lunch and Learn
- Making Your Tables Pretty with Reactable
- Employment and Social Development Canada (ESDC)
- Create Webpage Maps to Visualize and Measure User Experience Digital Journey (GitHub)
- Public Service Data Challenge (First Place Team Winner) News
- AgSearch (precursor to AgGPT) - An agriculture search engine using a hybrid search strategy to answer long questions and increase search relevance
- AgGPT - An AI Chat Assistant to Help Farmers and Agri-businesses Find Resources
- Evaluation GenAI Committee
- Create Organizational Maps with Text Analytics, and Deploy Local Language Models for Program Evaluations
- Presented critical gaps in containerization, MLOps, and CI/CD practices to directors across departments, including the Shared Services Canada, Treasury Board Secretariat, and Global Affairs Canada, translating complex issues for both technical and non-technical audience.
Projects
- Ontario Rent Contract Review: Developed an online compliance checker (ReadMyLease.ca) for Ontario Form 400 rental agreements, parsing PDFs and using LLM-driven legal analysis to validate lease clauses against the Residential Tenancies Act and Ontario Standard Lease Appendix—flagging issues like unlawful deposits, guest restrictions, and maintenance lapses. I manage the full CI/CD workflow on my own Linux server, with a Next.js frontend and a Python backend. Demo: YouTube Link.
- Automated Invoicing System: Developed a Streamlit-based UI with a Python backend to generate and email invoices via the Gmail API. The application reads historical data to auto-fill fields, enabling doctors to quickly create and send invoices. Deployed locally for doctors using Docker for easy distribution and environment consistency.
- Backtesting Software in R: Developed a robust backtesting tool in R to evaluate the historical performance of various technical trading strategies and company fundamentals (stocks and options).
- Python Real Estate Simulator: Created a comprehensive real estate simulation tool in Python to pinpoint investment opportunities. The tool conducts neighborhood analysis, and performs computer appraisals (sales comparison and income approach). It also includes a payment simulation feature that forecasts cash flow changes due to interest rate fluctuations.
- LLM for program evaluation: Developing a RAG-based local LLM optimized on program evaluation tasks. (Streamlit App)
Small Apps & Random things
- What is my CMA for Canada Carbon Rebate (CCR)? Find your Census Metropolitan Areas (CMAs) by typing your address or postal code. App
- How good was Charlie 2.0? How good was the CRA chatbot at answering the CRA FAQ’s? Find out on this Github.
- How effective is government outreach? I use geospatial analysis and census data to improve the understanding of social and economic dynamics for government policies. This provides a practical guide on Gender-Based Analysis Plus (GBA+) to evaluate the impact on vulnerable populations.
Tutorials
- How to set up LLM on Google Colab (llama-cpp) (Github)
- How to set up a local LLM and make a local inference endpoint (llama-cpp, fastAPI) (Github)
- Process mapping with Markov Chains and Probabilistic Graphical Model. How do we model complex business process and find bottlenecks ? (Github)