Disclaimer: In the spirit of this course, this website is being developed with the assistance of Generative AI tools.

CIS 7000: Accelerating Research with Generative AI (Fall 2025)

This seminar explores practical approaches for researchers to effectively utilize Generative AI models through hands-on experimentation and exploration. Students will investigate applying these models to research tasks such as summarizing literature, writing assistance, code development, and visual aid creation. The course also aims to challenge your preconceptions of generative AI and inspire new perspectives on the ways in which it can be used. Potential topics include effective prompting, critical evaluation of outputs, and responsible use considerations including relevant ethics and safety. Students are expected to be familiar with and actively engaged in research methodology.

For instance, we might ponder questions like:

How good is DeepResearch at doing comprehensive literature review?
Are LLMs good at picking up new software libraries?
Can LLMs create good presentations?

The core philosophy of this course is learning by doing and analyzing. We will actively try using various LLMs and related tools for research tasks, report back on our experiences, and collectively analyze each other’s results. This collaborative process will help us understand the practical capabilities, limitations, and effective usage patterns of these powerful technologies in a research context.

Furthermore, understanding how generative AI works at a fundamental level may potentially help us utilize these tools more effectively. Therefore, while the primary focus is on application, we may also delve deeper into the common generative AI technologies that underpin the tools we use.

Course Structure and Expectations

This hands-on seminar requires active student engagement in discussions and explorations of GenAI tools. Students will experiment with applying LLMs and other generative models to various research tasks (similar to those listed below), focusing on prompt engineering, utilizing different platforms, and analyzing results. A key component involves critically evaluating AI outputs for accuracy, bias, relevance, and ethical implications. The course culminates in a project where students apply GenAI to accelerate a part of their own research pipeline or a defined problem, evaluating its effectiveness and sharing their findings and techniques with the class. Furthermore, students will be encouraged to engage with and provide feedback on their peers’ projects, fostering a collaborative learning environment.

Instructor: Eric Wong (exwong@cis)

Class: F 1:45pm-4:44pm, AGH 214

Website: https://www.cis.upenn.edu/~exwong/arga/

Registration: To register, you need to sign up both on courses.upenn.edu and also submit the questionaire on the CIS waitlist.

Potential Applications / Topics

This course aims to explore how Large Language Models (LLMs) might be applied across the research lifecycle. We may explore and experiment with potential applications such as:

I. Ideation and Literature Review

Topic Exploration & Brainstorming: Generating research questions, exploring related concepts, identifying potential angles.
Literature Discovery: Suggesting relevant papers, authors, or journals based on initial ideas or abstracts.
Paper Summarization: Creating concise summaries of research papers or specific sections.
Understanding Complex Concepts: Explaining difficult methodologies, theories, or terminology found in papers.
Identifying Research Gaps: Analyzing existing literature (provided by the user) to highlight underexplored areas.
Synthesizing Information: Grouping related findings or arguments from multiple sources.
Translation: Translating abstracts or key sections of papers written in other languages.

II. Research Design and Planning

Methodology Brainstorming: Suggesting potential experimental designs or data analysis techniques.
Grant Proposal Assistance: Drafting sections like literature review, background, or potential impact.
Data Management Planning: Outlining potential data organization strategies or documentation needs.

III. Data Collection and Analysis

Survey/Questionnaire Design: Generating draft questions or structuring survey flow.
Generating Synthetic Data (Use with Caution): Creating placeholder or example datasets for testing analysis pipelines (requires careful validation).
Qualitative Data Exploration: Assisting in thematic analysis by suggesting potential themes or categories in text data (requires significant human oversight).
Interpreting Results: Rephrasing complex statistical outputs into more understandable language (verify accuracy carefully).

IV. Code Generation and Development

Code Generation (Simulations/Scripts): Creating initial code drafts for simulations, data processing scripts, or experimental setups (e.g., Python, R, MATLAB).
Code Generation (Analysis): Writing code snippets for specific statistical analyses or data visualization using libraries (e.g., Python’s Matplotlib/Seaborn, R’s ggplot2).
Code Debugging: Identifying errors or suggesting improvements in analysis scripts.
Code Documentation: Generating comments or explanations for code sections.

V. Theory Development and Formal Reasoning

Structuring Proofs: Assisting in outlining logical steps for mathematical or theoretical proofs (requires careful validation).
Exploring Counterexamples: Brainstorming potential edge cases or counterexamples to test hypotheses.
Formal Verification Assistance: Helping translate natural language specifications into formal logic or code for verification tools (requires significant expertise and validation).
Mathematical Formula Manipulation: Assisting with symbolic manipulation or checking steps in derivations (use with extreme caution and verify results).

VI. Writing and Manuscript Preparation

Drafting Assistance: Generating initial drafts for sections like Introduction (background), Methods (descriptions based on user input), or Discussion (potential interpretations).
Rephrasing and Paraphrasing: Rewriting sentences or paragraphs to improve clarity, flow, or conciseness, or to avoid repetition.
Grammar and Style Checking: Identifying grammatical errors, suggesting stylistic improvements, and ensuring consistency.
Abstract Generation: Creating draft abstracts based on the main findings and manuscript content.
Citation Management Assistance: Formatting references based on specific style guides (though dedicated tools are often better).
Generating Figure/Table Captions: Drafting descriptive captions based on the data presented.

VII. Dissemination and Presentation

Presentation Outline Creation: Generating a logical structure for a research talk.
Slide Content Generation: Drafting text points, summaries, or explanations for presentation slides.
Speaker Note Generation: Creating draft speaker notes to accompany slides.
Visual Aid Ideas: Suggesting types of visuals (charts, diagrams) appropriate for presenting specific data or concepts.
Poster Content Generation: Assisting in drafting text sections for academic posters.
Responding to Reviewers: Helping draft responses to reviewer comments by outlining arguments or rephrasing points.
Lay Summaries: Translating technical research findings into language accessible to a general audience.

VIII. Electronic Communication and Online Presence

Email Drafting: Assisting with composing professional emails (e.g., to collaborators, reviewers, for inquiries).
Website Content: Generating or refining text for personal research websites, lab websites, or project pages.
Social Media/Blog Posts: Drafting posts for disseminating research findings or engaging with the academic community online (e.g., Twitter threads, blog entries).
Public Communication: Assisting in writing content for platforms like Wikipedia or public science communication outlets.

Important Considerations:

Throughout the course, we will emphasize:

Accuracy & Verification: LLM outputs must always be critically evaluated for factual accuracy, correctness (especially code and formal reasoning), and potential biases.
Originality & Plagiarism: Ensure that LLM-generated text is properly attributed or significantly rewritten to avoid plagiarism. Understand journal/conference policies on AI use.
Confidentiality: Be cautious about inputting sensitive or unpublished data into public LLMs.
Ethical Use: Consider the ethical implications of using AI in research, including data privacy and potential biases in the models themselves.

Schedule

Tentative schedule. Topics are subject to change.

Date	Topic	Notes
Aug 29	Writing	Related work: Worksheet 1 & Report generation: Worksheet 2
Sep 05	Writing	Abstracts & Intros: Worksheet 3 & Figures: Worksheet 4
Sep 12	Writing	Reviewing & editing: Worksheet 5
Sep 19	AI Research Mixer (No class)
Sep 26	Coding	AI coding: Worksheet 6
Oct 03	Coding
Oct 10	Fall Term Break (No class)
Oct 17	Studies
Oct 24	Superprompting	Megaprompting: Worksheet 7
Oct 31	Proofs	Guest instructor: Surbhi (Worksheet)
Nov 07	Rubrics and Learning	Rubrics: Worksheet 9 & Learning: Worksheet 10
Nov 14	Studies
Nov 21	Class Projects	Presentations
Nov 26	Thanksgiving (No class)
Dec 05	NeurIPS conference (No class)