Classroom Activity: Large Language Model Comparison
Estimated time: ∼ 10 - 30 minutes (depends on how many prompts instructor or students wants to use)
Description: Using DartmouthChat, use the side-by-side feature to send the same prompt to two different models, for example, ChatGPT-4o mini, Gemini 2.0 Flash, or Llama 3.2 11. Select multiple models and evaluate the responses generated. Observe the responses and assess for accuracy, bias, and limitations. Ask the students for their observations/ experiences and which model they prefer and why.
Structure:
Instructors could project their computer screen to conduct the activity with the entire class OR have students work in pairs/small groups..
Discuss what students may or may not know about these different models. They most likely have heard about OpenAI's GPT-4 series, but what do they know about Anthropic's Claude series or Mistral?
Provide a list of proposed prompts OR have students come up with prompts themselves.
Question types: recount historical events, solve math problems, report current events, write a fictional short story (something creative).
Students can ask the LLM "Are you sure" to begin interacting with potential hallucinations and misinformation and to understand why these happen in the first place.
Achieved learning outcome: Become familiar with contemporary chatbot models and be able to interact with multiple GenAI modalities (potentially image and text).
Gain preliminary understanding of GenAI operation and LLM-human interaction; Confidently interact with and use GenAI and LLMs
Students will learn that there is not a single thing called "AI"; how AI works; how different LLMs treat the same inputs; how LLMs interpret natural language