This paper is about student responses to more difficult BQs. First, it looks at whether Japanese university students, teachers, and AI agree about the difficulty of BQs. Second, it looks for correlation between student performance and question difficulty.
What AI says about research into this topic re: inquiry based learning:
The research generally indicates that the relationship is not linear; rather, it follows a “U-shaped” or “Goldilocks” curve, where the best performance occurs when the difficulty matches the student’s developmental stage and is supported by appropriate scaffolding.
Here are the key findings from peer-reviewed literature regarding this correlation:
1. High-Order vs. Low-Order Questions (Bloom’s Taxonomy)
Research often categorizes question difficulty using Bloom’s Taxonomy.
- The Correlation: Studies (e.g., Chin & Brown, 2002) show that “low-order” questions (factual recall) lead to surface learning, while “high-order” questions (analysis, synthesis, evaluation) are essential for the “deep learning” characteristic of IBL.
- The Caveat: Higher difficulty (complexity) requires higher cognitive load. If the question is too complex without prior knowledge, student performance drops due to Cognitive Overload (Sweller, 1988).
2. The Meta-Analysis of Scaffolding (Lazonder & Harmsen, 2016)
In one of the most cited meta-analyses on the subject, Lazonder and Harmsen reviewed 72 studies to see how different levels of inquiry guidance (which effectively adjusts the “functional difficulty” of a question) affect performance.
- Finding: They found a positive correlation between the level of guidance and learning outcomes. Specifically, as the difficulty of the task increases, the necessity for scaffolding increases.
- Impact on Performance: When difficult questions were paired with high scaffolding (like prompts or cues), student performance was significantly higher than when students were given “easy” questions with no guidance.
3. The “Big Questions” in SOLE (Mitra et al.)
Given your background in SOLE (Self-Organized Learning Environments), the research by Sugata Mitra and colleagues is highly relevant.
- The Theory of “Big Questions”: Mitra’s research suggests that for IBL to be effective, the question must be “Big”—meaning it should be difficult enough that an individual cannot answer it alone, but a group can using technology.
- The Correlation: In SOLE-focused studies, “difficulty” is a catalyst for collaboration. If the question is too easy (e.g., “What is the capital of France?”), the inquiry process collapses because there is no need for collective investigation. Performance in SOLE is maximized when the question is intellectually provocative but accessible via search tools.
4. The Zone of Proximal Development (ZPD) and Flow
Educational research often applies Vygotsky’s Zone of Proximal Development to question difficulty in IBL.
- The Correlation: Performance is highest when the question difficulty is slightly above the student’s current independent ability.
- The “Flow” State: Researchers like Csikszentmihalyi argue that if difficulty is too high relative to skill, students experience anxiety; if it is too low, they experience boredom. Both lead to poor performance in an inquiry setting.
5. Science Inquiry Synthesis (Minner, Levy, & Century, 2010)
This massive synthesis of 20 years of research (1984–2002) looked at the impact of inquiry-based science instruction.
- Finding: Students performed better on conceptual understanding tasks when they were challenged with higher-difficulty investigations (active thinking and drawing conclusions from data) compared to passive, low-difficulty tasks.
- Conclusion: There is a clear positive correlation between the cognitive challenge of the inquiry and the retention of complex concepts.
Summary of Research Findings:
| Question Type | Difficulty Level | Typical Student Performance | Necessary Condition |
| Factual/Closed | Low | High Accuracy / Low Retention | None |
| Moderate/Guided | Medium | High Retention / High Engagement | Basic Scaffolding |
| Complex/Big Question | High | Deep Understanding / Innovation | Group Collaboration / Scaffolding |
| Ambiguous/Extreme | Very High | Low Performance / Frustration | High-level Expert Guidance |
Key References for your Academic Writing:
- Lazonder, A. W., & Harmsen, R. (2016). Meta-Analysis of Scaffolding-Based Inquiry Learning. Review of Educational Research.
- Minner, D. D., Levy, A. J., & Century, J. (2010). Inquiry-based science instruction—what is it and does it matter? Journal of Research in Science Teaching.
- Chin, C., & Brown, D. E. (2002). Student-generated questions: a meaningful aspect of learning in science. International Journal of Science Education.
- Mitra, S. (2012). Beyond the Hole in the Wall: Discover the Power of Self-Organized Learning. TED Books.
Research method
- Remake student feedback form. After each presentation, sts should write a meatier form. Include free writing on their performance. Any changes to the BQ difficulty assessment? Is the Likert scale question enough? AI says the following:
When you’re investigating whether the difficulty of the Big Question (BQ) in a Self-Organised Learning Environment (SOLE) correlates with students’ performance, you’re essentially asking two linked research questions:
- Does task difficulty predict or influence learning outcomes?
- If so, how can we measure both difficulty and performance reliably?
Because SOLE research itself (in mainstream educational literature) is still relatively niche compared to other inquiry-based learning studies, you won’t find many large quantitative studies specifically on BQ difficulty and performance — but there are several strategies and frameworks from educational research you can adapt.
Here are effective research strategies you can use — plus tools and measures to operationalize the key variables:
🧠 1. Clearly Define and Operationalize Your Variables
📌 Big Question Difficulty
You need a way to measure and rate how difficult each BQ is for learners. Options include:
- Expert Rubric or Teacher Ratings: Create a rubric based on Bloom’s Taxonomy or cognitive complexity to rate each BQ’s difficulty (e.g., recall vs. analysis vs. synthesis).
- Student Perception Ratings: After a SOLE session, have students rate how difficult they found each BQ on a Likert scale.
- Task Complexity Metrics: Use cognitive complexity frameworks (e.g., SOLO taxonomy or Bloom’s levels) to classify questions.
- SOLO taxonomy describes increasing levels of understanding complexity (pre-structural → extended abstract).
📍 Creating an explicit and detailed rubric ensures consistency in judgments and enables later statistical analysis.
📌 Student Performance
SOLE typically emphasizes open inquiry and problem solving, so traditional single sum scores may miss important aspects of learning. Consider multiple performance indicators:
- Learning Artifacts: Depth and accuracy of answers, quality of presentations, written reflections.
- Process Metrics: Evidence of metacognitive strategy use or collaboration.
- Pre/Post Tests: Short assessments before and after SOLE sessions to measure gains.
Many SOLE studies measure metacognitive or motivational outcomes, which can complement performance metrics. For example, a quasi-experimental study showed SOLE pedagogy improved metacognitive skills such as problem-solving and empathy.
📊 2. Choose the Right Research Design
📌 Correlational Design
This is the most direct way to test if question difficulty correlates with performance:
- Collect measures of BQ difficulty and student performance across many sessions.
- Use correlation coefficients (e.g., Pearson’s r) and regression models to see whether higher difficulty predicts performance differences.
This design does not prove causation but can show meaningful relationships.
📌 Quasi-Experimental or Mixed Methods
If you can manipulate BQ difficulty (e.g., randomly assign students to easier vs. harder BQs), you can compare performance:
- Pre/Post Comparisons: Are learning gains different for more vs. less challenging questions?
- Control Variables: Control for student prior ability, motivation, or topic familiarity.
Because SOLE is student-driven, mixed methods (quantitative + qualitative) often capture richer evidence, especially if performance isn’t fully reflected in exam scores.
🧪 3. Use Validated Instruments Where Possible
- Metacognitive Scales: Instruments like MSAS (used in some SOLE research) can track changes in metacognitive processes alongside performance.
- Self-Report Surveys: These help contextualize performance data with students’ perceptions of challenge.
- Rubrics for Learning Outcomes: Rubrics aligned to learning objectives improve measurement reliability.
📈 4. Analyze with Appropriate Statistical Tools
- Correlation Coefficient: Measures degree of linear association between BQ difficulty and performance.
- Regression Analysis: Helps control for confounding variables (e.g., prior knowledge, engagement).
- Hierarchical/Multilevel Models: Useful if data are nested (e.g., students within classes).
🧩 5. Contextualize with Theory
Situate your work within broader literature on inquiry-based learning and constructivist approaches. For example:
- Inquiry-based learning research emphasizes that open questions can improve higher-order thinking skills.
- The Visible Learning meta-analysis highlights that formative evaluation and student self-assessment often correlate strongly with achievement.
Linking SOLE research to these broader frameworks adds conceptual rigor to your study.
🧠 Practical Steps for Implementation
- Design a BQ difficulty rubric (e.g., based on Bloom’s or SOLO taxonomy).
- Collect performance data across multiple SOLE sessions.
- Triangulate with student self-reports and maybe observational coding.
- Analyze relationships statistically to see if difficulty predicts performance.
- Interpret results with both quantitative and qualitative evidence.
🧾 Summary of Research Strategies
| Strategy | Purpose |
|---|---|
| Rubric-based scoring of BQ difficulty | Creates a consistent measure of task challenge |
| Multiple performance metrics | Captures the full impact of learning (beyond test scores) |
| Correlational and regression analysis | Tests predictive relationships |
| Mixed-methods approaches | Provides depth and context |
| Use of validated scales | Enhances reliability and comparability |
If you want, I can help you build a sample rubric for Big Question difficulty or draft a research design outline (with instruments and analysis plan) tailored to your course and institution.
build a sample rubric for Big Question difficulty
Great — since you’re teaching SOLEs at a Japanese university, I’ll design this so it works in a higher-education EFL context and is usable for quantitative analysis in a paper.
Below is a research-ready Big Question (BQ) Difficulty Rubric you can use both for instructional planning and statistical coding.
Big Question Difficulty Rubric (SOLE – University Level)
Purpose
To operationalize and quantify the cognitive and linguistic difficulty of SOLE Big Questions for correlational research.
Structure
Each Big Question is rated across 4 dimensions, each scored 1–4.
Total possible score: 4–16.
You can treat:
- 4–7 = Low difficulty
- 8–11 = Moderate difficulty
- 12–16 = High difficulty
Dimension 1: Cognitive Complexity
(Adapted from Bloom’s Taxonomy)
| Score | Descriptor | Example Indicators |
|---|---|---|
| 1 | Recall / Identification | Define, list, identify facts |
| 2 | Explanation / Understanding | Explain causes, summarize information |
| 3 | Analysis / Comparison | Compare perspectives, analyze relationships |
| 4 | Synthesis / Evaluation / Creation | Propose solutions, evaluate evidence, design something new |
Example
1 → “What is climate change?”
4 → “How should Japan redesign its energy policy to address climate change?”
Dimension 2: Knowledge Structure Complexity
(Adapted from SOLO Taxonomy)
| Score | Descriptor | Example Indicators |
|---|---|---|
| 1 | Single idea (Unistructural) | One concept required |
| 2 | Multiple independent ideas (Multistructural) | Several facts needed but not integrated |
| 3 | Integrated understanding (Relational) | Must connect ideas meaningfully |
| 4 | Abstract generalization (Extended Abstract) | Must transfer to new contexts or theorize |
This dimension helps distinguish between “many facts” and “deep conceptual integration.”
Dimension 3: Linguistic Demand (EFL-Specific)
| Score | Descriptor | Indicators |
|---|---|---|
| 1 | Low language load | Concrete vocabulary, short responses possible |
| 2 | Moderate language load | Some abstract vocabulary required |
| 3 | High language load | Requires explanation with academic vocabulary |
| 4 | Very high language load | Requires argumentation, hedging, nuanced expression |
This dimension is especially important in Japanese university SOLE contexts, where cognitive difficulty and language difficulty may not align.
Dimension 4: Ambiguity / Openness
| Score | Descriptor | Indicators |
|---|---|---|
| 1 | Closed, clear answer | One correct answer |
| 2 | Mostly structured | Limited range of acceptable answers |
| 3 | Open-ended | Multiple plausible responses |
| 4 | Highly open / Ill-structured problem | No single correct answer; requires criteria setting |
SOLE research often suggests that ill-structured problems increase inquiry and collaboration — but they may also increase cognitive load.
Example Scoring
Big Question:
“How will AI change university education in Japan over the next 20 years?”
| Dimension | Score |
|---|---|
| Cognitive Complexity | 4 |
| Knowledge Structure | 3 |
| Linguistic Demand | 3 |
| Openness | 4 |
| Total | 14 (High difficulty) |
Making It Research-Ready
1. Inter-Rater Reliability
Have:
- You + one colleague rate each BQ
- Calculate Cohen’s kappa or intra-class correlation
This strengthens publication credibility.
2. Student Perceived Difficulty (Optional Add-On)
After each SOLE session:
“How difficult was today’s Big Question?” (1–7 Likert scale)
This allows you to compare:
- Objective difficulty (rubric score)
- Subjective difficulty (student rating)
- Performance outcomes
Sometimes perceived difficulty predicts learning more strongly than objective difficulty.
3. Statistical Use
You can:
- Treat total score (4–16) as a continuous variable.
- Or treat each dimension separately in regression: Example: Performance = β1(Cognitive) + β2(Language) + β3(Openness) + ε
This may reveal that linguistic demand predicts performance more strongly than cognitive complexity in EFL settings.
Why This Works for Publication
- Anchored in Bloom + SOLO (strong theoretical foundation)
- Separates cognitive vs linguistic difficulty (important in Japan)
- Allows both correlational and regression analysis
- Works with mixed methods
If you’d like, I can next help you:
- Turn this into a short “Methodology – Instrument” section for a Japanese university 紀要
- Design a student performance rubric aligned to this difficulty scale
- Build a simple data collection sheet for SPSS/R analysis
Using the scale above, rate the following questions: 1. What is the purpose of idioms? 2. How has privatization impacted the pace of technological innovation in space exploration? 3. How has privatization impacted the pace of technological innovation in space exploration?
Great — I’ll rate them using the 4-dimension rubric exactly as constructed (Cognitive Complexity, Knowledge Structure, Linguistic Demand, Openness; 1–4 each).
Note: Questions 2 and 3 are identical, so they will receive the same score unless you intended a variation.
1️⃣ “What is the purpose of idioms?”
Dimension Ratings
| Dimension | Score | Rationale |
|---|---|---|
| Cognitive Complexity | 2 | Requires explanation/understanding, not evaluation or synthesis |
| Knowledge Structure | 2 | Multiple ideas (culture, metaphor, communication), but limited integration required |
| Linguistic Demand | 2 | Some abstract vocabulary (“purpose”), but manageable for university EFL |
| Openness | 2 | Some variation possible, but fairly constrained conceptually |
Total Score: 8 / 16
Moderate Difficulty (Lower Range)
This is conceptually manageable and language-accessible. It is unlikely to overload students cognitively, but it may not strongly stimulate deep inquiry unless scaffolded.
2️⃣ “How has privatization impacted the pace of technological innovation in space exploration?”
Dimension Ratings
| Dimension | Score | Rationale |
|---|---|---|
| Cognitive Complexity | 3 | Requires analysis of cause-and-effect relationships |
| Knowledge Structure | 3 | Must integrate economics, policy, and technological development |
| Linguistic Demand | 4 | Heavy academic vocabulary: “privatization,” “pace,” “technological innovation” |
| Openness | 3 | Multiple defensible positions; evidence-based argument expected |
Total Score: 13 / 16
High Difficulty
This is cognitively demanding and linguistically heavy in a Japanese university EFL context. The linguistic load alone may significantly affect performance outcomes.
3️⃣ Same Question (Identical)
Because it is identical:
Total Score: 13 / 16
High Difficulty
Research Insight (Important for Your Study)
Notice something interesting:
- Question 1 is conceptually simpler but still moderately open.
- Question 2 is cognitively complex and linguistically dense.
If you find performance differences between these BQs, your regression model may show:
- Linguistic demand is a stronger predictor than cognitive complexity.
- Or openness predicts collaboration quality more than test performance.
In a SOLE setting, the second question may produce:
- Richer discussion
- But lower linguistic accuracy scores
- And possibly higher perceived difficulty

コメントを残す