Formulating Research Questions

Good research begins with a well-formulated question. This guide teaches you how to identify gaps in the literature and develop testable hypotheses.

Identifying Research Gaps

Good research fills a gap. There are four main types of gaps:

Methodological gap: A technique doesn’t exist for this problem
Empirical gap: Something hasn’t been evaluated on specific data
Theoretical gap: A phenomenon is not yet explained
Application gap: A method hasn’t been applied to a specific domain

Example: Healthcare AI Thesis

A healthcare AI thesis might address multiple gap types:

Methodological: Multimodal fusion of structured EHR + text + sketches
Empirical: Patient-reported symptoms for ED outcome prediction
Application: 3D symptom drawings in clinical prediction models

The PICOT Framework

Adapted from medical research, PICOT helps structure research questions:

Population: Who are you studying? (e.g., Emergency department patients)
Intervention: What are you testing? (e.g., Multimodal AI prediction model)
Comparison: What’s the baseline? (e.g., Structured EHR-only models)
Outcome: What are you measuring? (e.g., Prediction accuracy for mortality, admission)
Time: When/how long? (e.g., Prospective validation over 6 months)

Example PICOT

P: Emergency department patients I: Multimodal transformer (EHR + symptoms + sketches) C: ETHOS (structured EHR only) O: Outcome prediction accuracy (AUROC, AUPRC) T: Retrospective analysis of 50,000 visits (2020-2024)

Hypothesis Formation

Types of Hypotheses

Type	Example
Null Hypothesis	Adding patient-reported symptoms does NOT improve predictions over structured EHR alone
Alternative Hypothesis	Adding patient-reported symptoms significantly improves predictions (p < 0.05)
Research Question	To what extent do multimodal patient-reported symptoms improve ED outcome prediction?

Characteristics of Good Hypotheses

A good hypothesis should be:

Testable: Can be evaluated empirically
Specific: Clear about what is being measured
Falsifiable: Could potentially be proven wrong
Relevant: Addresses an important question
Novel: Adds something new to the field

Scoping Your Research

The Research Triangle

Balance three competing factors:

Scope: How broad is the question?
Depth: How thoroughly will you investigate?
Time: How long do you have?

For a thesis: narrow scope + good depth + 6 months = publishable work

Avoiding Common Pitfalls

Too Broad: “How can AI improve healthcare?”

Problem: Impossible to answer in one thesis

Too Narrow: “Does batch size 32 work better than 31 for this specific dataset?”

Problem: Not interesting or generalizable

Just Right: “Can multimodal transformers fusing structured EHR with patient-reported symptoms improve ED outcome prediction compared to EHR-only models?”

Specific, testable, and impactful

Example Thesis Statement

“This thesis investigates whether multimodal transformers that fuse structured EHR data with patient-reported symptom text and anatomical sketches can improve emergency department outcome prediction compared to models using structured data alone.”

Breaking Down the Statement

Problem: ED outcome prediction
Method: Multimodal transformers
Data: EHR + text + sketches
Comparison: Structured data alone
Evaluation: Improvement in prediction accuracy

From Question to Contribution

Your research question should lead to clear contributions:

Methodological Contributions

Novel architecture for multimodal fusion
Techniques for handling sparse patient-reported data
Methods for incorporating anatomical sketches

Empirical Contributions

First evaluation of patient symptoms for ED prediction
Comparison of fusion strategies
Ablation studies showing component importance

Practical Contributions

Demonstrated improvement over clinical baselines
Interpretable predictions for clinicians
Deployable system architecture

Validating Your Research Question

Before committing to a research question, check:

Literature check: Has this been done before?
Feasibility check: Can you get the data and compute?
Timeline check: Can you complete this in 6 months?
Impact check: Will this be publishable and useful?
Supervisor check: Does your advisor agree?

Connection to Literature Review

Your research question emerges from reading papers:

Read broadly: Understand the field (30-40 papers, Pass 1-2)
Identify patterns: What’s been done? What’s missing?
Find gaps: Where can you contribute?
Formulate question: Specific, testable, novel
Design experiments: Experimental design to answer the question

Refining Your Question

Iterate on your research question:

Initial (too broad): “Multimodal AI for healthcare”

Refined: “Multimodal AI for ED outcome prediction”

More specific: “Fusing EHR and patient symptoms for ED outcomes”

Final: “Multimodal transformers combining structured EHR, symptom text, and anatomical sketches for ED outcome prediction vs. EHR-only baselines”

Exercise: Write Your Research Question

Using the PICOT framework, write:

Your research question in one sentence
Your null hypothesis
Your alternative hypothesis
Three specific contributions your work will make

Examples from Machine Learning Research

Computer Vision

Question: “Can vision transformers match or exceed CNN performance on ImageNet without extensive data augmentation?”

Gap: Transformers successful in NLP but not proven in vision Contribution: ViT paper showing transformers competitive with CNNs

Multimodal Learning

Question: “Can natural language supervision enable zero-shot visual recognition?”

Gap: Fixed-class supervised learning doesn’t scale Contribution: CLIP enabling zero-shot transfer via language

Healthcare AI

Question: “Can masked event modeling on EHR sequences enable zero-shot clinical prediction?”

Gap: Healthcare models need labeled data for each task Contribution: ETHOS showing zero-shot generalization

Reading Research Papers - Identifying gaps through literature
Experimental Design - Testing your hypotheses
Structuring Papers - Communicating your findings

Key Takeaways

Identify gaps: Four types - methodological, empirical, theoretical, application
Use PICOT: Structure your question systematically
Scope appropriately: Narrow scope + good depth + realistic timeline
Make it testable: Formulate clear null and alternative hypotheses
Plan contributions: Know what you’ll contribute before starting
Validate early: Check feasibility, novelty, and impact before committing