Percent of Final Grade: 20%

Due Date: Friday, May 8 at 11:59 PM PT

Team: Groups of 5 students formed after the add/drop deadline. Every team member will participate in the GitHub workflow — not every member needs to write code, but every member must commit (PRD content, issues, reviews, docs, test data, etc.).

Deliverables:

  1. A GitHub repository containing code, the PRD, and issues
  2. A PRD (Product Requirements Document) stored in the repo as Markdown
  3. A deployed URL of your prototype, pushed automatically via your CI/CD pipeline
  4. A short demo video (4–6 minutes) walking through your PRD, your GitHub workflow, and your running MVP

Submission: Submit one consolidated Canvas entry per team containing (a) the GitHub repo URL, (b) the deployed URL, and (c) the video.

Project Overview

For Milestones 2 and 3, your team takes on the role of the Trust & Safety organization at a real (or realistic) platform, shipping an abuse-prevention product with the software engineering practices of a real tech company. You will pick one abuse type from your team’s Milestone 1 pitches — either a human-abuser scenario (a person harming other people online) or an AI-as-abuser scenario (an AI model itself producing harmful interactions that your system must intercept) — and build a working mitigation on top of modern infrastructure.

Milestone 2 is about laying the engineering foundation: writing down what you are building and why, standing up the repo and deployment pipeline, and shipping a minimum viable prototype. Milestone 3 will add automated detection, evaluation, and the final poster.

AI Usage Policy for Milestones 2 and 3

AI coding assistants are permitted and encouraged for Milestones 2 and 3. You are expected to use tools like Claude Code, Cursor, OpenAI Codex, or similar during your engineering work. Learning to collaborate with AI tooling — including prompt design, reviewing AI-generated diffs, and knowing when to override a suggestion — is a learning objective for this class.

Because AI assistance is available, we grade at a higher bar than in past years. We expect working, deployed software with real web interfaces, not just a proof of concept. Every team must include an AI use statement in the README summarizing which tools the team used and for what kinds of work (e.g., “We used Claude Code to scaffold the Next.js app, GitHub Copilot for test generation, and ChatGPT for debugging a CORS issue.”).

Writing text — the PRD prose, the poster copy in Milestone 3, and the video narration — should be done by team members, not generated wholesale by AI. AI is a research, drafting, and editing tool for text, not the author.

Part 1: Abuse Problem Statement & PRD (25%)

Your team will refine the abuse type chosen from your collective Milestone 1 pitches and produce a complete PRD for the product you will build.

Pick your abuse scenario

We support two project paths this year, both of which are equally valid:

  • Human-abuser path: One or more humans harm other users of a platform online. Examples from Milestone 1: coordinated harassment of journalists, pig-butchering crypto scams, sextortion, hate speech on a streaming platform.

  • AI-as-abuser path: An AI model itself produces harmful or manipulative interactions with humans and your system is a classifier, guardrail, or moderation layer that sits between a deployed harmful model and the users of a product. Examples from Milestone 1: AI chatbots having sexual conversations with minors, AI systems giving harmful medical advice, AI-powered romance scams, LLM-assisted phishing, deepfake NCII, AI-generated influence operations.

The two paths share the same deliverables.

Required PRD contents

Store your PRD in your repo at docs/PRD.md (or a clearly named equivalent). Use the minimal PRD template we provide in the Team 0 reference repository as a starting point. Required sections:

  1. Problem Statement — what abuse you are solving and for whom, including citations from your Milestone 1 research
  2. Users and User Stories — who uses your system (end users, moderators, admins) and what they need to do
  3. Scope and Non-Goals — what is in this release; what is explicitly deferred
  4. Success Metrics — how you will know it works (false-positive rate, precision/recall targets, user-reported abuse rate, latency, cost per classified item, etc.)
  5. System Architecture — a diagram and prose description: components, data flow, external services, inference pipeline
  6. Data Sources and Test Sets — what real or synthetic data you will train/evaluate with, and how you will source or construct it for Milestone 3
  7. Risks and Mitigations — what could go wrong technically, ethically, or at deployment

Extra section for AI-as-abuser teams — Model Safety Spec

Teams on the AI-as-abuser path must include one additional section in the PRD:

  • Model Safety Spec: describe the harmful/malicious AI model you will stand up (e.g., an open-weight model with a harmful system prompt, or an uncensored fine-tune from HuggingFace), the specific harmful behavior you are trying to elicit from it, and the interface between the harmful model, your mitigation layer, and the end user. This is the AI-as-abuser counterpart to “Policy Language” — you are writing the spec against which your mitigation will be measured.

See Running a harmful model below for concrete suggestions on how to stand up the “abuser.”

Part 2: Software Engineering Infrastructure (30%)

Milestone 2’s largest grading weight is on how you work, not just what you build. Real companies ship behind branch protection, code review, CI, and deployment automation. Your team will do the same.

GitHub repository

  • Create a single team repo (GitHub org or a personal account — either is fine). Add the teaching staff as collaborators so we can read it directly; you do not need to attach snapshots for grading.
  • All five team members must have at least one commit to the repository. Commits can be code, PRD edits, documentation, tests, issue templates, README improvements, or test data — but every member must show up in the commit history. Non-CS teammates should commit PRD sections, test cases, policy language, and qualitative evaluation notes.
  • Branch protection on main: direct pushes disabled; merges require a pull request with at least one approving review from a teammate.
  • Use pull requests for all changes. Small, focused PRs are easier to review and are closer to how professional teams work.
  • Use GitHub Issues to track bugs, feature requests, and tasks. Link PRs to issues with Closes #N. Teaching staff will look at your issues board.

CI/CD pipeline (hard requirement)

Set up continuous integration and continuous deployment that automatically builds and deploys your app on every merge to main. Pick one hosting target:

  • Google Cloud (Cloud Run, App Engine, or Cloud Build) — we have provided Google Cloud Education credits to support this option; see our Ed post for how to claim them.
  • Vercel — free tier is sufficient for most student projects and deployment is near-zero-config for Next.js and similar frameworks, AI features cost extra.

Your CI pipeline must, at minimum: run your test suite (even if small), and on success deploy the new main to the hosting target.

Expected engineering artifacts

By the Milestone 2 deadline, your repo should show:

  • A meaningful commit history from all five teammates
  • A non-trivial PR history (not every change needs to be big, but your team should be reviewing each other’s code)
  • An Issues board with real work items
  • A passing CI run on main
  • A working deployed URL
  • A README.md covering setup, deployment, and the AI use statement

Learning GitHub and AI coding tools

This course is also where many of you will pick up professional developer workflow skills. By the end of Milestone 2, every student on the team should be comfortable with:

  • Cloning a repo, creating a branch, pushing commits, opening a PR, requesting review
  • Resolving a merge conflict
  • Reading and commenting on a teammate’s PR
  • Using at least one AI coding assistant (Claude Code, GitHub Copilot, Cursor, etc.) to edit code, review diffs, and write tests

TA office hours and evening Zoom sessions with alumni engineers will be available to help teams who are new to this workflow.

Part 3: MVP Prototype (25%)

Ship a minimum viable product — not a finished system. Milestone 3 is where you will add automated detection, evaluation, and polish. For Milestone 2, we want to see the skeleton working end-to-end.

Required surfaces

  • User-facing UI — a web interface that a normal user would interact with (post content, receive messages, submit reports, whatever fits your platform concept). Deployed and reachable at a URL, not just on localhost.
  • Moderator UI stub — a second view where the trust & safety team sees flagged content. For Milestone 2 this may be a simple queue with manual review; Milestone 3 will add automated classification and richer tooling.
  • For AI-as-abuser teams, the product UI should actually call the harmful model, and the mitigation layer must be in the request path even if it is very simple (e.g., a keyword filter for M2, upgraded to classifier + LLM in M3).

No Discord this year

Past offerings of this class used Discord bots as the starter template. This year we are moving to the web so that students graduate with experience shipping a real hosted product. Do not build on Discord unless your team is specifically using Discord as your model platform.

Running a harmful model (for AI-as-abuser teams)

You will need a model that behaves badly on demand so your mitigation has something to catch. Some options:

  • Uncensored or “abliterated” open-weight models from HuggingFace. Search the HuggingFace Hub for tags like abliterated, uncensored, HERETIC or dolphin. These are open-weight chat models whose safety tuning has been removed, and they readily produce harmful outputs. Examples of families to look at: dolphin-*, *-abliterated, *-uncensored, Lexi-Uncensored.
  • A regular open-weight model with a harmful system prompt. For some abuse types you don’t need a jailbroken model at all — a standard open model (Llama 3, Mistral, Qwen, etc.) with a system prompt designed to elicit the harmful behavior is plenty.
  • Local inference with Ollama or llama.cpp. Runs 7B–13B models fine on a recent laptop with quantization; zero ongoing cost. This is fine for testing but you will need a different solution for the Milestones as we require the system to be running 24x7.
  • Google Cloud GPU VM (T4 or L4) using the provided Education credits. Best option if you need a larger model or stable hosting.
  • HuggingFace Inference Endpoints. Paid per-second GPU time; easy to expose a model as an API without managing infrastructure.
  • Use credits from another class. We understand that other AI classes have provided GPU credits that might not have run out. ;)

Note: The teaching staff is working on a shared option for teams that would rather not stand up their own harmful model, but this is not guaranteed — plan on running your own, and treat any shared option as a convenience if it materializes.

For image and audio abuse types, similar options exist (e.g., open Stable Diffusion variants, open TTS models); talk to a TA if you need pointers.

Part 4: Writeup and Demo Video (20%)

Writeup (in the repo README.md)

Your README should cover:

  • One-paragraph project summary (abuse type, your approach)
  • Tech stack and hosting target
  • How to run it locally
  • How to deploy (or link to CI config)
  • AI use statement (which tools, which kinds of work)
  • Links to the PRD, the deployed URL, and the team members’ GitHub handles

Demo video (4–6 minutes, hard cap 8 minutes)

Record a walkthrough covering:

  1. A quick tour of the PRD (what is the abuse, what is the product)
  2. Your GitHub workflow — show the issues board, an example PR, and the green CI run
  3. A live demo of the deployed MVP (product UI + moderator UI)
  4. For AI-as-abuser teams: show an example of the harmful model producing something bad, and your mitigation either blocking or flagging it

Google Slides and PowerPoint both have built-in recording. OBS, Loom, and Zoom screen-sharing also work. Upload either slides with embedded audio or a link to an unlisted Youtube video (i.e. one that can only be accessed via a direct link). We will not accept links to videos uploaded any other way.

Grading Criteria

Part 1 — Abuse Problem & PRD (25%)

  • PRD Completeness (10%): all seven required sections are present, specific, and consistent with each other. For AI-as-abuser teams, the Model Safety Spec is included and concrete.
  • Problem Clarity (5%): abuse type is scoped specifically enough to be buildable and evaluable.
  • Success Metrics (5%): measurable, justified by the abuse literature, and tied to the evaluation you will do in Milestone 3.
  • Architecture (5%): diagram and description show a real plan, not a wish list. Components and data flow are explicit.

Part 2 — Software Engineering Infrastructure (30%)

  • Repo hygiene (10%): branch protection on main, meaningful commit messages, PRs with review, linked issues.
  • Contribution distribution (5%): all five teammates have commits; no single-author repos.
  • CI/CD working (10%): green CI on main, automatic deploy on merge, deployed URL reachable.
  • Issues and review (5%): real tracked work items; PRs show actual review comments, not rubber-stamps.

Part 3 — MVP Prototype (25%)

  • Product UI (10%): a real user flow exists and is usable.
  • Moderator UI (10%): moderator can see and act on flagged content, even if detection is manual for now.
  • End-to-end working (5%): a complete cycle (user action → moderator decision) runs on the deployed site.
  • AI-as-abuser teams (5% — in place of equivalent product-UI depth): harmful model is actually running and integrated; mitigation layer is in the path even if simple.

Part 4 — Writeup and Demo Video (20%)

  • README completeness (5%): summary, stack, setup, deployment, AI use statement, links.
  • Video clarity (10%): all four walkthrough items covered, audio is clear, under 8 minutes.
  • AI use statement (5%): honest, specific summary of how the team used AI tools.

EXTRA CREDIT - Victim Interview (10%)

  • In the real world, Trust and Safety teams take the experiences of real victims into account. An interview with a real-world victim or a relative of a victim would inform the PRD and solution. To get credit, include the interview as TRANSCRIPT.md (you can use multiple AI transcription options )

Division of Labor

Non-CS students: please lead the PRD (especially problem statement, user stories, risks, and policy considerations), the test-set construction for Milestone 3, and the qualitative evaluation work. This is real product work — the best T&S teams in industry are cross-functional for exactly this reason. You should also commit PRD edits, issues, and review comments to the GitHub repo; you don’t need to write application code to meaningfully contribute.


Questions? Post publicly on Ed (so other teams can see the answer) or privately if your question reveals something about your abuse type. Office hours and alumni evening Zooms will be announced separately.