Open to opportunities

Hi, I'm Yuan Shan.

M.S. in Statistical Science · Duke University. Deeply passionate about large language models and reinforcement learning — currently researching how RL can drive the self-evolving processes of language models.

Portrait of Yuan Shan
nameYuan Shan / 单元 roleM.S. Statistical Science · Duke
01

About

I'm a Master of Statistical Science student at Duke University.

Off campus, I love sports — football, swimming, and karting are my favorites. I'm a die-hard Real Madrid fan.

My favorite singer is Mayday (五月天), and my favorite movies are the ones starring Stephen Chow (周星驰).

02

Research

Publications and ongoing work.

Publications

Accepted · ICASSP 2026 Conference Paper

MEOW: A Metadata-Driven, End-to-End LLM Framework for Academic Survey Outline Generation

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2026)

We propose MEOW, the first metadata-driven, end-to-end LLM framework for academic survey outline generation. Our pipeline combines supervised fine-tuning (SFT) with reinforcement learning (GRPO) under the RLHF paradigm, leveraging Chain-of-Thought annotations to guide structured taxonomy reasoning. We curate ~20k+ surveys from arXiv, bioRxiv, and medRxiv and design multi-dimensional metrics (structure, content, pragmatics) for outline evaluation. Optimizing Qwen3-8B (full & LoRA) with reward functions targeting structural distance and format compliance, MEOW achieves state-of-the-art performance, surpassing baselines such as SurveyX and outperforming strong LLMs (e.g., GPT-5 Nano, DeepSeek-R1) on our benchmark.

  • LLM
  • SFT + GRPO (RLHF)
  • Qwen3-8B
  • vLLM
  • Survey Generation

Working Papers

In Progress Working Paper

Self-Evolving LLM Agent — title forthcoming

Currently being drafted.

A research project on self-evolving LLM agents. Details, abstract, and a preprint will be added here once the manuscript is ready. Stay tuned.

  • LLM Agent
  • Self-Evolution
  • Coming Soon
03

Projects

A selection of recent work — most from the last year.

Apr 2026 LLM

Music Chat Recommender

Multi-turn LLM chatbot for personalized music recommendations. Built with Gradio + Groq Llama 3.3 70B, integrating the iTunes Search and YouTube Data APIs. Live demo deployed on Hugging Face Spaces.

  • Python
  • Gradio
  • Llama 3.3
  • Groq
Apr 2026 Causal ML

PA & Mental Distress: Age-Dependent Heterogeneity

Causal machine learning analysis of physical activity's effect on mental distress across 1.9 million U.S. adults (BRFSS 2015–2024), uncovering age-dependent treatment heterogeneity.

  • Causal Forests
  • BRFSS
  • Heterogeneous TE
  • Python
Apr 2026 DOE

Optimizing Turbine Blade Design

Design-of-experiments and surrogate-model optimization for turbine blade geometry, balancing aerodynamic efficiency against manufacturing constraints.

  • DOE
  • Surrogate Models
  • Python
Apr 2026 DOE

Paper Helicopter — Sequential DOE

STA 643 project: end-to-end design optimization via fractional factorial screening, confirmation runs, RSM optimization, and validation experiments.

  • Fractional Factorial
  • RSM
  • R / Python
Mar 2026 Sports Analytics

Bradley–Terry Analysis · Premier League

Bradley–Terry modeling of English Premier League seasons to estimate latent team strengths and quantify home-field advantage over time.

  • Bradley–Terry
  • Jupyter
  • Sports Analytics
Jan 2026 Reproducibility

HIV Self-Testing Study · Reproduction

A careful reproduction of a published HIV self-testing study, validating statistical methodology, effect estimates, and robustness checks.

  • Reproducibility
  • R
  • Public Health
Sep 2025 LLM

Survey Outline Evaluation Benchmark

End-to-end pipeline for generating & evaluating academic survey outlines with LLMs — multi-dimensional metrics, statistical analysis, and a public benchmark. Companion repo to the ICASSP 2026 paper.

  • Benchmark
  • Evaluation
  • Python
2023 · 2024 Visualization

Football Transfer Market Visualization

Team-led D3.js dashboard analyzing the evolution of the global football transfer market. Live demo available — interactive UI design with Python data pipeline.

  • D3.js
  • JavaScript
  • Python
04

Internship Experience

Industry and research internships.

  1. May 2025 – Oct 2025

    LLM Algorithm Engineer Intern

    Wenge Tech

    Work accepted at ICASSP 2026

    • Designed and built MEOW — the first metadata-driven, end-to-end LLM framework for academic survey outline generation, combining SFT with reinforcement learning (GRPO) under RLHF.
    • Constructed a complete evaluation stack: a public benchmark plus multi-dimensional metrics covering structure, content, and pragmatics for outline quality assessment.
    • Curated and distilled ~20k+ surveys from arXiv, bioRxiv, and medRxiv; built a Chain-of-Thought (CoT) annotation layer to guide structured taxonomy reasoning and improve survey coherence.
    • Optimized Qwen3-8B (full fine-tuning and LoRA) end-to-end; designed custom reward functions on structural distance and format compliance to align generation with human writing preferences.
    • Leveraged vLLM to accelerate both inference and RL training, achieving state-of-the-art performance — surpassing baselines such as SurveyX and outperforming strong LLMs (GPT-5 Nano, DeepSeek-R1) on the benchmark.
  2. Jul 2024 – Aug 2024

    Data Engineering Intern

    Lakala Payment

    • Assisted in the development and maintenance of the company's internal data platform using Hadoop ecosystem tools.
    • Designed and developed the internal data platform's data quality monitoring system.
  3. Jun 2023 – Oct 2023

    Research Intern

    Beijing Academy of Artificial Intelligence (BAAI)

    • Contributed to BAAI's 3D electron-microscopy (EM) processing toolchain, with a focus on advanced neuron-segmentation methodologies including PyTorch Connectomics (PyTC) and Local Shape Descriptors (LSD).
    • Built reproducible training pipelines for large-volume connectomics data, covering preprocessing, augmentation, and patch-based inference for cubic-micron EM volumes.
    • Optimized segmentation models via systematic hyperparameter tuning combining Grid Search and Bayesian Optimization, materially improving downstream segmentation quality on benchmark volumes.
  4. Jun 2022 – Aug 2022

    Machine Learning Engineering Intern

    Tencent · Tencent Video

    • Independently owned an end-to-end ML project to surface high-potential video categories on the Tencent Video platform — running the full pipeline from data extraction and cleaning, through algorithm design and evaluation, all the way to written reporting for product stakeholders.
    • Built scalable data pipelines on Tencent's internal task-scheduling platform, handling dependency resolution across heterogeneous data sources for a reproducible workflow.
    • Engineered an XGBoost-based scoring model to produce quantitative indicators of video-category "potential", surfacing under-served verticals with strong growth signal; tuned the model with cross-validated hyperparameter search.
    • Wrote production SQL and Python against large-scale internal data warehouses to extract, join, and transform features at scale.
    • Deployed Spark for distributed feature engineering and model scoring, keeping the pipeline tractable on volumes of up to 7,500,000 records.
    • Synthesized findings into two project reports that fed directly into downstream content-strategy and operational decisions on Tencent Video.
05

Education

Duke University

M.S. in Statistical Science

B.S. in Interdisciplinary Studies — Data Science

Aug 2024 – May 2026 · Jul 2020 – Aug 2024

Duke Kunshan University

B.S. in Data Science

Jul 2020 – Aug 2024

Honors & Leadership

  • Dean's List with Distinction · Duke Kunshan University
  • Finalist Prize · Interdisciplinary Contest in Modeling (MCM/ICM)
  • First Prize · National Olympiad in Informatics in Provinces (NOIP)
  • · Technical Intern · Beijing 2022 Olympic & Paralympic Winter Games Organising Committee (Dec 2021 – Jan 2022)
  • · Outstanding Peer Mentor & Outstanding Peer Tutor · DKU (2022–2024)
  • · Volunteer · Duke China-U.S. Summit 2023
  • · Student Leader · Kunshan Student Orientation Peers (KSOP)
  • · Best Volunteer · 7th China Buyout Fund Annual Conference
06

Get in touch