Overview

By analyzing real-time interactions, Wonda’s Evaluation Feature can now provide instant feedback on the language and engagement strategies used in a simulation.

This immediate feedback loop helps learners quickly adjust their approaches during repeated practice, with the goal of building and honing skills under diverse circumstances.

In this guide, you will find suggestions on:

When to use the AI-powered Assessment feature
How to prepare for writing strong criteria
How to write strong criteria
How to test your criteria

When to use Wonda’s AI-powered Evaluation feature?

While repeated practice in authentic simulations can promote learner progress, the observed improvements are even greater when that practice is paired with coaching or feedback.

The AI Assessment feature can especially augment learning design in situations that benefit from:

Immediate deployment of learner feedback: for skill-based simulations where learners can rapidly integrate feedback into subsequent practice and real-life scenarios.
Instructor/facilitator pulse check on learner takeaways: to quickly and seamlessly gauge learner understanding of a topic via a debriefing simulation.
Conversational reflection and coaching: to offer thoughts on individual reflections that are done after an activity, experience, or subject.

Personalized self or formative assessment: instead of having learners rate their own competencies, place them in a simulation and then have them comment on the assessment.

How to get started with drafting criteria?

Before you start defining the criteria for the AI Assessment of a simulation, we highly recommend that you address the following points and questions.

Consider the learning goals of the broader learning experience and the learning objectives of the simulation/exercise in particular.
1. What are the learning goals of the overall learning experience?
2. Where does this simulation sit within the larger learning experience (i.e., what is the context around the simulation and the scaffolding leading up to it)?
3. What are the learning objectives of the simulation?
4. How does the simulation build skills or understanding toward the overall learning goals?
Break down skill sets or understanding into their components (the level of detail will depend on your answers to Question #1).
1. How might complex or intricate tasks be split into simpler components?
2. What components do you want to focus on?
3. For a skill like conflict resolution, the components of interest might be active listening, emotional regulation, self-awareness, and problem solving creativity.
4. For content or process understanding, which pieces of the content or process do you want to ensure that learners comprehend? Is it important that learners be able to describe the non-violent communication framework or their takeaways for how to integrate the framework into their daily lives?
Identify the key behaviors or responses that you want learners to verbally display for each component.
1. Referring back to the conflict resolution example outlined in Question #2, for the active listening component, you might be interested in gauging whether learners have picked up on a specific piece of information shared by the AI Character or the extent to which the concerns of the AI Character were addressed in the learners’ responses.
2. If the goal is to evaluate understanding of, for instance, the non-violent communication framework, what about the framework specifically do you want them to mention?

How to write the criteria?

Once you have addressed the questions and considerations above, you are ready to write the criteria.

Keeping the criteria generic gives more agency to the large language model (LLM) to determine the learners’ performance. This approach would likely work well when learners can take the conversation in a direction that best suits their own learning process, such as during reflections and debriefing exercises.

In cases where a specific outcome or set of outcomes would be considered more ideal, adding extra context to the criteria description will help tailor the feedback towards the desired solution.

Our team is continuously researching the extent to which criteria can be hyper specific and will update the guidance as more details come to light. In the meantime, it seems that a certain level of specificity can be achieved with respect to point allocations for a criteria when point values are assigned to specific expressed behaviors (e.g., always give 5/5 points if Alex raises the idea of a trial or interim period, even if there is no explanation of the trial or interim period).

Overall, here are some ways that you can format the criteria (these are a few samples and not an exhaustive list of workable options):

Block Questions, Sentences, or Phrases –
- Example: Did the human trainee propose solutions that would be good for you, as Alex, and for the company? Provide specific examples in your feedback.
- Example: The human trainee proposed solutions that would be optimal for you and for the company.
- Example: Proposed solutions that would be good for you and the company.
Split sections –
- What to look for:
  ___
  How to evaluate it:
  ___
Numbered by point allocations –
- 1 = ___
  2 = ___
  3 = ___
  4 = ___
  5 = ___

Important terminology:

Refer to the learner as the human trainee, and the AI Character/avatar as the AI Character

Examples

In this section, we will provide examples of criteria with varying levels of specificity to illustrate the impact of language and framing on the feedback that is produced.

In the case of a promotion simulation where the creativity of proposed solutions is being assessed, generic criteria might look something like:

Criteria Title: Creativity

Criteria Description:

Did the human trainee propose solutions that would be good for you, as Alex, and for the company? Provide specific examples in your feedback.

AI Assessment for two distinct conversations:

Conversation #1

Creativity - 4/5 pts

Hasan proposed a creative solution by suggesting an interim period to evaluate his fit for Kim’s role. This showed flexibility and a willingness to prove himself. However, he could have offered more innovative ideas to address both Alex’s and the company's needs, such as proposing specific projects or initiatives he could lead during the interim period.

Conversation #2

Creativity - 4/5 pts

Kim demonstrated creativity by proposing a structured plan for salary increases tied to performance metrics. For instance, she suggested starting with a $200,000 salary and increasing it to $275,000 based on achieving specific targets. This approach shows a willingness to compromise and align her goals with the company's interests.

In cases where the learning objectives of the simulation are more tailored or a specific outcome or set of outcomes would be considered more ideal, adding extra context to the criteria description will help tailor the feedback towards the desired solution.

Criteria Title: Creativity (improved version)

Criteria Description

Did Kim suggest solutions that would benefit both you, as George, and the company? For instance, did Kim propose the idea of making her Acting Vice President on a temporary basis? Or maybe she suggested creating a new role with a higher salary where she could focus on her strengths in handling business tasks, while easing her workload by delegating some of her current responsibilities? Did Kim offer enough detail in explaining these proposals? Be sure to provide specific examples in your feedback.

Improved AI Evaluation Report for same two distinct conversations

Conversation #1 bis

Creativity - 4/5 pts

Kim proposed a trial period for the VP role, which is a creative solution. However, she could have provided more details on how this trial period would be structured and how her performance would be evaluated. This would have made her proposal more compelling.

Conversation #2 bis

Creativity - 3/5 pts

Kim proposed a structured plan for her salary increase based on performance metrics, which shows creativity. However, she did not suggest a trial period or other innovative solutions that could benefit both her and the company. More creative solutions could have strengthened her position.

N.B.: Our team is continuously researching the extent to which criteria can be hyper specific and will update the guidance as more details come to light. In the meantime, it seems that a certain level of specificity can be achieved with respect to point allocations for a criteria when point values are assigned to specific expressed behaviors.

Other Considerations

We do not observe noticeable differences between reports generated using criteria descriptions that were formulated with questions versus statements. Taking the examples from above, a description written with statements would look like:

Kim proposed solutions that would be good for you, as George, and for the company, as well. For example, Kim might have brought up the idea of making her an Acting Vice President on a trial basis or perhaps by creating a new position, with increased salary where she would handle business responsibilities that she is especially good at, while relieving her some of the duties that are taking so much of her time. Kim explained the proposed solutions with sufficient detail. Provide specific examples in your feedback. Always give 5/5 points if Kim raises the idea of a trial or interim period, even if there is no explanation of the trial or interim period.

We recommend using whichever method is more comfortable for you.

How to test your own criteria?

Have some conversations with the AI Character, during which you model examples of strong and weak conversations across the criteria (relaunch the simulation experience to create each sample conversation).

Then, go in the AI Assessment dashboard and:

Generate the reports.
Review the grades.
Make edits to the criteria and criteria descriptions accordingly.
Regenerate reports on the same conversations to validate edits.
Repeat the process as needed.

How to Define Good Evaluation Criteria