The Effect of Level-Marked Tasks on Students’ Mathematics Self-Efficacy and Persistence—Investigating a Tiered Activity

In school mathematics, tiering tasks according to students’ readiness levels is a common approach to adapting content to address students’ varying abilities. To distinguish between different tiers, the tasks are commonly marked by their level of difficulty, as in so called level-marked tasks – a strategy used in differentiated instruction to help students gain mastery and improve performance. However, less is known about whether and to what extent the marking of the tasks works as intended. Regarded as a significant predictor of students’ performance in mathematics, the level marking of mathematics tasks and their effects on students’ self-efficacy and persistence were investigated. The online survey part of an experimental design was used to collect data from 436 lower secondary students in Norway. Independent sample t-tests and Mann–Whitney U tests revealed that, when students across groups were given the same mathematics tasks with and without level markings, tasks marked as difficult reduced both students’ mathematics self-efficacy and their persistence, even for students with high baseline self-efficacy. Although more research is needed to fully understand how the level markings of mathematics tasks affect students’ self-efficacy and persistence, the reported results have implications for both mathematics teachers and textbook authors.


Introduction
Self-efficacy, initially defined by Bandura (1997) as "beliefs in one's capabilities to organize and execute the courses of action required to produce given attainments" (p.3), is very advantageous to possess (Bandura, 1993).An ever-growing body of research has reported the outcomes associated with possessing self-efficacy, such as performance and persistence (Bong & Skaalvik, 2003;Collins, 1984;Jacobs et al., 1984;Lyman et al., 1984;Multon et al., 1991;Óturai et al., 2023;Pajares, 1996;Pajares & Miller, 1995;Schunk, 1991;Sexton & Tuckman, 1991;Skaalvik et al., 2015;Zakariya, 2021).In line with Sasidharan and Kareem (2023), we argue that in mathematics-a school subject that many students tend to perceive as difficult (Moyer et al., 2018)-high selfefficacy and persistence are of the utmost importance.If students' self-efficacy is high, they might exhibit a higher degree of participation in mathematics activities, which can increase their well-being (Sasidharan & Kareem, 2023).Therefore, we assert that it is important to provide students with opportunities to develop self-efficacy, and hence persistence, in mathematics.
The focus of this paper is level marking of mathematics tasks.To be clear, we are not investigating how task difficulty levels affect students' self-efficacy and persistence, but rather examining how different labels themselves (easy, medium, and difficult) affect students' self-efficacy and persistence.
To our knowledge, even though level marking of mathematics tasks is a widespread approach, little research has reported on the consequences of marking tasks with different levels of difficulty.Therefore, in this paper, we report on a study from Norway investigating how marking tasks as easy, medium, and difficult affects students' selfefficacy and persistence in mathematics.We assert that this study is important for mathematics teachers to understand students' self-efficacy and persistence when level-marked tasks are used in tiered activities.In addition, the results may be useful for policymakers, textbook authors, and teachers when they have to make suggestions about whether we should continue, or implement, the use of level-marked tasks as a strategy to help students gain mastery and improve their performance.
1.1 Self-Efficacy and Level Marking of Mathematics Tasks Bandura (1997) distinguished between perceived self-efficacy and outcome expectations.While outcome expectations are a person's judgment of the likely consequences that a particular performance will produce (Bandura, 1997, p. 21), in this paper, we focus on students' perceived mathematics self-efficacy, which is understood as "a situational or problem-specific assessment of an individual's confidence in her or his ability to successfully perform or accomplish a particular task or problem" (Hackett & Betz, 1989, p. 262).
Self-efficacy is a multifaceted phenomenon that varies in strength, level, and generality (Bandura, 1997).In this paper, we specifically measure students' strength of self-efficacy, which refers to how strong a person's belief is in completing a given task (Bandura, 1997).According to Street et al. (2017), students' self-efficacy strength is affected by how they assess a specific task (generality) and how difficult they perceive the task to be (level).This is supported by other researchers, who found that students' self-efficacy is affected by a task's difficulty level (Chen & Zimmerman, 2007) and that students who find a given task difficult report lower self-efficacy compared to students who find the task easy (Collins, 1984;Liu et al., 2020).However, how students perceive a task's difficulty level can differ between students, and such perceptions do not necessarily reflect an "objective" level of difficulty defined by a teacher (Street et al., 2017).Liu et al. (2020) found that even if students reported higher self-efficacy when they were given easy tasks (marked as easy) compared to when they were given difficult tasks (marked as difficult), there was a "difficulty-efficacy mismatch" (p.419), as their performance was lower for easy tasks compared to that for difficult tasks.From this, the researchers concluded that students' self-efficacy can be affected by perceived task difficulty and not by assessments of tasks in relation to one's own capabilities.Taken together, from this, we can infer that when given information about a task's difficulty level (as in the level marking of mathematics tasks), this may affect how difficult students perceive a task to be (level of self-efficacy), which again may affect their strength of self-efficacy.However, to the best of our knowledge, this phenomenon has not yet been researched.

Persistence and Level Marking of Mathematics Tasks
Building on a recent literature review on the issue, DiNapoli (2023) suggested that persistence should be defined as the "voluntary continuation of a goal-directed action in spite of obstacles, difficulties, or discouragement," a definition originally set forth by Peterson and Seligman (2004, p. 229).Persistence is linked to students' shortterm goals (e.g., solving a difficult mathematics task) and not long-term goals (e.g., getting a master's degree), similar to the idea of grit (DiNapoli, 2023).According to DiNapoli (2023), focusing on persistence when studying students' habits empirically will provide information about students' engagement time.In particular, in experimental studies like ours, DiNapoli (2023) found that researchers often operationalize persistence as "the amount of time spent on working on a difficult task" (p.20).
Considering this operationalization of persistence, a task's degree of difficulty is decisive in how best to measure persistence, especially considering how "more competent individuals may find solutions more quickly than less competent individuals" (Shen et al., 20216, p. 42).Multon et al. (1991) andDiNapoli (2023) suggested that persistence should be measured during challenging circumstances.This is in line with Montague and Applegate (2000), who found that when a task is easy, students who struggle with mathematics spend more time trying to solve the task compared to high achievers, and when a task is hard, the first group of students tends to give up, while high achievers persist longer.Similarly, in a study conducted by Spielberg and Azaria (2021), students' Published by IDEAS SPREAD persistence was examined by revealing the true difficulty of some riddle tasks.The results revealed no statistically significant difference between students who received easy tasks marked as easy compared to students who received the same tasks without information about the level of difficulty.In addition, when difficult tasks were marked according to the level of difficulty, students' time spent coming up with a solution was significantly higher compared to when students were given the same tasks without a difficulty level marking.Interestingly, the students' solution ratio was not influenced by revealing the tasks' true difficulty level.From this, we inferred that when a task is solvable, as in Spielberg and Azaria (2021), it is difficult to conclude whether a longer amount of time spent solving a task means that students are more persistent or less efficient.Hence, to avoid this confounder, the task needs to be impossible to solve.Therefore, a common approach is to measure persistence by recording the time students spend attempting to solve an impossible task (e.g., Brown & Inouye, 1978;Collins, 1984;Jacobs et al., 1984;Shen et al., 2016).As suggested by Stein and Burchartz (2006), an impossible task is one whose only solution is "there is no solution."

Persistence and Self-Efficacy
A large body of research has examined the relationship between self-efficacy and persistence (Bong & Skaalvik, 2003;Collins, 1984;Jacobs et al., 1984;Lyman et al., 1984;Multon et al., 1991;Schunk, 1991;Sexton & Tuckman, 1991;Skaalvik et al., 2015), but few have investigated this relationship in the context of mathematics tasks using an experimental design.Three rare examples come from Sexton and Tuckman (1991), Collins (1984), andJacobs et al. (1984).Sexton and Tuckman (1991) measured self-efficacy and persistence when completing a series of mathematics tasks.At the beginning of the survey, generalized self-efficacy and previous performance were the best predictors of persistence, but at the end of the survey, specific self-efficacy and recent experiences completing the survey were the best predictors of persistence (Sexton & Tuckman, 1991).In other words, experience completing the survey changed the way students behaved, and their new behavioral patterns served as a source that influenced their choice to persist in the future.In the study conducted by Collins (1984), students were divided into three achievement groups (low, average, and high achievers) and divided again into two subgroups according to their self-efficacy (high or low).All students were given two impossible tasks.Collins (1984) found that for low-and average-achieving groups, students with high self-efficacy spent more time trying to solve the tasks compared to students with low self-efficacy, while for students in the high-achievement group, those with low self-efficacy spent more time solving the tasks compared to students with high self-efficacy.Jacobs et al. (1984) conducted an experimental study and found a strong effect of self-efficacy on students' persistence when they were asked to solve an impossible task that required them to make a tracing that combined a set of lines without lifting the pen from the paper.The students' self-efficacy was manipulated, with one group being told that the task was easy and another that the task was difficult.The students' persistence was measured by the amount of time they spent working on the task.Students who were told that the task was difficult reported lower self-efficacy than those who were told the task was easy.In addition, the results indicated that students who were told that they would do well on the task persisted significantly longer than those who were told the opposite.The last study in this list was the one we found to be most similar to ours, however, Jacobs et al. (1984) did not include a control group where participants were not provided any information about the difficulty of the task.None of the three examples discussed included level markings in their designs.

The Present Study
In the present study, we investigated how the level marking of mathematics tasks according to different labels affect students' self-efficacy and persistence.We examined different labels for the same task because previous research found that the level markings of mathematics tasks in textbooks are not always correct (Brändström, 2005;Singh, 2017).In addition, it can be problematic to mark tasks according to levels of difficulty; Krauthausen (2018) claimed that the level of difficulty is a subjective evaluation.From this, we learned that the level marking of a task might not always be perceived as correct.
In the beginning of the survey, the task had no level marking, and students' self-efficacy, related to this task, refers to as students' "«baseline self-efficacy»".Considering students' baseline self-efficacy (low or high), we also investigated how the level marking affected student's self-efficacy and persistence.Hence, we raised the following four research questions (RQs): RQ 1 : How does the level marking of a mathematics task that is impossible to solve affect students' selfefficacy?
RQ 2 : Considering students' baseline self-efficacy, how does the level marking of a mathematics task that is impossible to solve affect students' self-efficacy?
RQ 3 : How does the level marking of a mathematics that is impossible to solve affect students' persistence?Published by IDEAS SPREAD RQ 4 : Considering students' baseline self-efficacy, how does the level marking of a mathematics task that is impossible to solve affect students' persistence?
Given that previous research found a negative effect of "difficult" marking of tasks (which were considered easy) on students' self-efficacy (Herset et al., 2023) and a positive effect on students' self-efficacy when they were told the task was easy (Jacobs et al., 1984), we put forward the following two directional hypotheses in accordance with RQ 1 : H 1A : Students' self-efficacy is significantly reduced when they are faced with the same impossible task twice, first without a level marking and next with "difficult" marking.
H 1B : Students' self-efficacy is significantly increased when they are faced with the same impossible task twice, first without a level marking and next with an "easy" or "medium" marking.
Since we were unable to find previous research on the relationship between task level marking and baseline selfefficacy on students' self-efficacy, and since students' perceived task difficulty level might differ from the task's actual difficulty level, which is important when students form their efficacy beliefs (Street et al., 2017(Street et al., , 2022)), we put forward the following two non-directional hypotheses in accordance with RQ 2 : H 2A : For students with low baseline self-efficacy, the level marking of an impossible task affects their selfefficacy.
H 2B : For students with high baseline self-efficacy, the level marking of an impossible task affects their selfefficacy.
To study students' behavioral persistence, we used an impossible mathematics task.This extends previous research on the effect of level-marked tasks on students' self-efficacy and persistence.We expected the effect of a "difficult" marking on time spent on an impossible task to be negative (Herset & El Ghami, 2022;Jacobs et al., 1984) and the effect of an "easy" marking on persistence to be positive (Jacobs et al., 1984).However, there is also support for the opposite (Spielberg & Azaria, 2021).Since the literature is not clear, in accordance with RQ 3 , we put forward a non-directional hypothesis: H 3 : The marking of an impossible mathematics task affects students' time spent on the task.
Previous research found that students' perceptions of a task's difficulty affect their self-efficacy (Street et al., 2017), and given the positive relationship between self-efficacy and persistence (Bong & Skaalvik, 2003;Collins, 1984;Jacobs et al., 1984;Lyman et al., 1984;Multon et al., 1991;Schunk, 1991;Skaalvik et al., 2015), we expected there to be a relationship between students' level of self-efficacy and the level marking of mathematics task on students' persistence.Therefore, we proposed two hypotheses in accordance with RQ 4 : H 4A : For students with low baseline self-efficacy, the level marking of an impossible task affects their time spent on the task.
H 4B : For students with high baseline self-efficacy, the level marking of an impossible task affects their time spent on the task.

Materials and Methods
In this paper, we draw on a subset of data collected from a larger project that aimed to investigate if and how the level marking of mathematics tasks affects students' self-efficacy, persistence, performance, and choice of tasks.
Acknowledging that "more competent individuals may find solutions more quickly than less competent individuals" (Shen et al., 2016, p. 42), a survey was created by the first author, inspired by the work of other researchers (Brown & Inouye, 1978;Jacobs et al., 1984;Miele et al., 2022;Shen et al., 2016).The survey included one impossible task placed at the end of the survey, and other tasks measuring factors such as self-efficacy.The purpose of the impossible task was to measure persistence.As in prior research (Brown & Inouye, 1978;Jacobs et al., 1984;Miele et al., 2022;Shen et al., 2016), we measured persistence by quantifying the time a student spent working on a given task.The task reads as follows: Emil and Sandra went to the shop to buy some fruit.Emil bought two apples and one pear.He paid NOK 30.Sandra bought four apples and two pears.She paid NOK 40.How much does one apple cost?
There are, of course, ethical considerations in giving students an impossible task.We agreed, as did Borge (2003), that the utility of the research outweighed any discomfort or risk for the informants.Because solving systems of linear equations with two unknowns where the system has no solution (which is what this task requires of the students) is included in the students' curriculum, we decided that there was no risk of giving them such a task.Published by IDEAS SPREAD However, if any of the students felt any discomfort after encountering this task, their teachers were informed and ready to discuss the task with them after data collection.

Data Collection
To ensure we could make generalizations, we used probability and cluster sampling, the latter being justified by the fact that the population of Norway is large and widely dispersed.To avoid a close cluster sample (Cohen et al., 2018), we recruited schools randomly across Norway.We selected 23 public schools and tested all students aged 13-15 years in the selected schools.Unfortunately, because of COVID-19 and school lockdowns, we needed to replace some of the randomly chosen schools with schools closer to the first author's university.
The schools were contacted by the first author, who called each of the chosen schools' principals.If the principals were willing to encourage the school's mathematics teachers to participate, the teachers also needed to agree.We ensured that no students would feel obligated to participate.To ensure that the data collection was done in the same way across all chosen schools, each mathematics teacher was given data collection instructions.For example, participants had to conduct the survey during a lesson (and not outside school) to make sure no one collaborated or used calculators.Before the data were collected, the Norwegian Social Science Data Service (previously NSD, now Sikt) approved the project.

Participants
In total, 436 students in grades 8 and 9 (i.e., ages 13-15 years) responded to the online survey.Some of the data were found to be incomplete or monotonous, indicating that participants had skipped items, so 82 responses were deleted.In addition, 14 response strings were detected as outliers because the participants had spent an excessive amount of time on the survey, and 9 were detected as outliers because the differences in students' self-efficacy were considered extreme.The final sample included 331 students, with an equal distribution of gender, coming from 23 schools from all regions in Norway (4% from Southern Norway, 9% from Western Norway, 30% from Eastern Norway, 10% from Mid-Norway, and 47% from Northern Norway).

Experimental Design
The survey used in this study was designed for a larger research project and included 10 solvable tasks and 1 impossible task.It was set up carefully with a complex design.In the following, we explain the experimental design by focusing on the impossible task, which is the one we report on in this paper.Figure 1 gives an overview of the five steps.
Figure 1.The Pretest-Posttest Control Group Design and Posttest-Only Control Group Design In Step 1, we conducted a cluster sampling of participants across schools.In Step 2, the participants were randomly assigned (by a computer) to four groups: one control group (CG) and three experimental groups (EG i , i = 1, 2, and 3).Each group received different versions of the online survey, making this a true experiment.In Step 3 (Set 1), the students were asked not to solve the task but to read the task and respond to the self-efficacy question-"How certain are you that you can solve this problem correctly?"-using a 100-point scale ranging from not certain at all (0) to absolutely certain (100), as recommended by the literature (Bandura, 2006;Zakariya et al., 2019).At this stage, the task had no level marking (see Figure 2), and the self-efficacy question was therefore used to measure what we referred to as the students' "baseline self-efficacy." Since the purpose was to measure students' self-efficacy and persistence, we further explain the survey's pretestposttest control group and posttest-only control group designs (Creswell & Creswell, 2018).In Step 4 (Set 2), the students were again presented with the task and asked the same self-efficacy question about how certain they were about being able to solve the task correctly.Here, too, the students were instructed not to solve the task.While the CG was given the same task all over again (with no level marking), for the students in the EGs, the task was now marked as easy, medium, or difficult.To avoid the feeling of having received the same task, as in Set 1, some words were replaced to make the tasks look different.Figure 2 shows how this played out for the task (e.g., "Casper" was replaced with "Emil" and "store" was replaced with "shop").The reader is reminded that Sets 1 and 2 involved other tasks not included in the analysis for this paper, which made it less easy for the participating students to recognize the task at hand.The time interval between Step 3 and Step 4 was estimated to be approximately 20 minutes, as Step 3 was at the beginning of the survey and Step 4 was at the end of the survey.
Figure 2. Measuring Self-Efficacy for the Specific Task in Sets 1 and 2 In Step 5, the students were told to solve the task and to write the answer in a text box.Persistence was measured by the time it took from a student was told to solve the task until he or she pressed the "send" button.The survey did not allow students to navigate between pages, making us more confident that the time measured was used only to solve the impossible task.

Statistical Analyses
We used the SPSS software package to test the hypotheses.First, the pretest-posttest control group design was used to measure the variable "difference in students' self-efficacy," which is the difference in students' selfefficacy from Set 1 to Set 2 (self-efficacy Set2 -self-efficacy Set1 ).This variable was used to analyze RQ 1 and RQ 2 by using a posttest-only control group design and was tested for normality using the method described by Kim (2013) and West et al. (1995).According to Kim (2013), when the sample size in each of the four groups was 50 ≤ n < 300, and z-values for skewness and kurtosis were less than ± 3.29, we assumed normality.As our results met the assumption of normality, we used a series of independent samples t-tests to investigate how the level markings of the task affected differences in students' self-efficacy by comparing the EGs' results with the CG's (RQ 1 ).
To differentiate between low and high "baseline self-efficacy," we recoded the variable self-efficacy Set1 into a twolevel ordinal variable (low and high self-efficacy Set1 ) using a median split, as suggested by Collins (1984).In this way, the participants in each of the four groups were divided into two roughly equal cell sizes, resulting in eight subgroups.Students' "baseline self-efficacy" was recorded from self-efficacy Set1 as follows: high baseline selfefficacy (self-efficacy Set1 ≥ 80) and low baseline self-efficacy (self-efficacy Set1 < 80).To distinguish between the two subgroups, we marked each of the four groups (CG, EG 1 , EG 2 , and EG 3 ) with the letter "L" for low baseline self-efficacy and the letter "H" for high baseline self-efficacy (CG L and CG H , EG 1L and EG 1H , and so forth).
According to Kim (2013), when the sample size is n < 50 and the z-values for skewness and kurtosis are less than ± 1.96, we can assume normality.The variable "difference in students' self-efficacy" for students with low baseline self-efficacy in each of the four groups (CG L , EG 1L , EG 2L , and EG 3L ) met the normality assumption, and we therefore used a series of independent samples t-tests to investigate H 2A .For students with high baseline selfefficacy in three of the four groups (CG H , EG 2H , and EG 3H ), the variable met the normality assumption.However, for EG 1H the variable did not meet the normality assumption.Therefore, to investigate H 2B , we used a combination of independent samples t-tests and the Mann-Whitney U test.
In addition, we used a posttest-only control group design to analyze RQ 3 and RQ 4 .As in previous research, we determined the maximum limit of time spent solving the task (Jacob et al., 1984) and recorded the variable "time" as either the amount of time spent solving the task or 20 min as a maximum.Since we could not assume normality, we used a Mann-Whitney U test to investigate the difference between students' time spent solving the task for students in the EGs compared to students in the CG (H 3 ).In addition, because of non-parametric data, we used a series of Mann-Whitney U tests to examine how the interaction between the level marking of mathematics tasks and students' baseline self-efficacy affected students' persistence (H 4A and H 4B ).

Results
The following section presents the findings of the study, which aimed at investigating how the level marking of a mathematics task affect students' self-efficacy and persistence.Our RQs and hypotheses were formulated based on what the literature review revealed.First, we will present the results of testing H 1A , H 1B , H 2A , and H 2B , which all focused on self-efficacy, before we move on to the results of testing H 3 , H 4A , and H 4B which focused on persistence.To measure the effect, the analysis focused on examining the difference between the CG and the EGs.
The results contribute to a new understanding of how students' self-efficacy and persistence can be influenced using level-marked tasks in tiered activities.

The Effects of Level Marking a Task on Students' Self-Efficacy
To test H 1A and H 1B , we calculated the difference between students' self-efficacy as measured in Set 1 (the task without a level marking) and in Set 2 (the task marked as easy, medium, or difficult or with no level marking).These differences are outlined in Table 1.Sexton & Tuckman, 1991).
As our results met the assumption of normality, Table 2 presents a series of independent samples t-tests (mean tests) used to examine H 1A and H 1B , where the difference in students' self-efficacy between EG 1 and CG (Pair 1), EG 2 and CG (Pair 2), and EG 3 and CG (Pair 3) was tested.No significant differences were found, except in Pair 3, where there was a statistically significant decrease in students' self-efficacy (t = 1.795, df = 150, p = .037,d = 0.26), with an effect size equal to 0.26 (a modest effect, according to Cohen, 2018).(Cohen et al., 2018) was not significant for Pairs 1 and 2 (p > .05),and equal variances were assumed.In Pair 3 (EG3 and CG), equal variance was not assumed.
To test H 2A , we calculated the difference in self-efficacy as measured in Set 1 and Set 2 for students with low baseline self-efficacy (see Table 3).As our results met the assumption of normality, Table 4 shows a series of independent samples t-tests used to examine H 2A , in which the difference in students' self-efficacy between EG 1L and CG L (Pair 1), EG 2L and CG L (Pair 2), and EG 3L and CG L (Pair 3) was tested.For students with low baseline self-efficacy, no significant change was found in their self-efficacy when an impossible task was marked by a difficulty level.(Cohen et al., 2018) was not significant for Pairs 1-3 (p > .05),and equal variances were assumed.
In the same manner, the procedure used when testing H 2A was repeated for H 2B .The difference in self-efficacy as measured in Sets 1 and 2 for students with high baseline self-efficacy is presented in Table 5.As our results met the assumption of normality, Table 6 shows the results of a series of independent samples ttests used to examine H 2A , in which the difference in students' self-efficacy between EG 2H and CG H (Pair 1) and EG 3H and CG H (Pair 2) was compared.Interestingly, in Pair 2, there was a statistically significant decrease in students' self-efficacy (t = 3.783, df = 88, p < .001,d = 0.80), with an effect size equal to 0.80 (a large effect according to Cohen, 2018).(Cohen et al., 2018) was not significant for Pair 1 (p > .05),and equal variance was assumed.For Pair 2 (EG 3H and CG H ), equal variance was not assumed.** The mean difference was significant at the .05level (p < .05) The difference in students' self-efficacy in EG 1H did not meet the normality assumption, and we therefore utilized a Mann-Whitney U test (a median test) to compare students' median of difference in self-efficacy between EG 1H and CG H .As shown in Table 7, the median (Med) in EG 1H (Med = 5.0) was not statistically significantly different compared to CG H (Med = 10.0, p = .379).

The Effects of Level Marking on Students' Persistence
To test H 3 , we investigated how "easy," "medium," and "difficult" markings affected students' persistence by comparing the students' time spent on tasks without level markings (CG) and with level markings (EGs).The median time (in seconds) for all students (n = 331) is illustrated in Table 8.As shown in Table 8, the Mann-Whitney U tests revealed that the time spent solving a task was not significantly different for tasks with "easy" and "medium" markings compared to the CG.Interestingly, one Mann-Whitney U test revealed significant differences in students' time spent on the task marked "difficult" (Med = 55.5, n = 86) and one with no level marking (Med = 100.0,n = 83, U = 2825, Z = -2.340,p = .019,r = 0.18), with an effect size equal to 0.18 (a small effect according to Cohen, 2018).
To test H 4A , we investigated how "easy," "medium," and "difficult" markings affected students' persistence by comparing the time spent on tasks without a level marking (CG L ) and with level markings (EGs L ) for students with low baseline self-efficacy.For students with low baseline self-efficacy (n = 161), the median time (in seconds) they spent trying to solve a task with different level markings is shown in Table 9.As shown in Table 9, the Mann-Whitney U test for students with low baseline self-efficacy revealed no significant differences in students' time spent on a task with any of the level markings (EGs L ) compared to no level marking (CG L ).
To test H 4B , we investigated how the different level markings affected persistence by comparing time spent on the task without a level marking (CG H ) and with level markings (EGs H ). For students with high baseline self-efficacy, the median time (in seconds) spent solving tasks with different level markings is shown in Table 10.The Mann-Whitney U tests for students with high baseline self-efficacy revealed that the time spent solving a task was not significantly different for easy-and medium-level markings compared to the control group.The Mann-Whitney U test for students with high baseline self-efficacy revealed a significant difference in students' time spent on a task with a "difficult" marking (Med = 72.00,n = 45) compared to no level marking (Med = 144.0,n = 45, U = 742.00,Z = -2.184,p = .029,r = 0.23), with an effect size equal to 0.23 (a small effect according to Cohen, 2018).

Discussion
Level-marked tasks is used in tiered activities (e.g., Brändström, 2005;Grave & Pepin, 2015) to facility mastery experiences.However, little research has been conducted to investigate how the difficulty labels contributes in the tiered activities to enhancing students' self-efficacy and persistence.The aim of this study was to investigate how the level marking of mathematics tasks affects students' persistence and self-efficacy, revealing discouraging results that, in isolation, do not speak in favor of using level-marked tasks in tiered activities.Our findings revealed that none of the labels (easy, medium and difficult) increased students' self-efficacy and persistence.In fact, the results indicated that difficult level marking of the tasks reduced both students' mathematics self-efficacy and their persistence.In this section, we discuss our results, organized by our four RQs.

Level Marking of Mathematics Tasks and Student's Self-Efficacy
In RQ 1 , we asked how the level marking of a mathematics task that is impossible to solve affects students' selfefficacy.We found that when the task that we reported on here was marked as difficult, it had a significantly negative effect on students' self-efficacy.This corroborates what has been reported elsewhere (Herset et al., 2023)-marking an easy task as difficult has a negative effect on students' self-efficacy, revealing that students tend not to judge their ability to solve a task by its content but rather by its level marking.
The results presented by Herset et al. (2023) and the ones we report on here are in good agreement with Bandura (1997) and Street et al. (2017Street et al. ( , 2022)), who claimed that a person's level of self-efficacy is influenced by their opinion of the difficulty of a task, as well as Liu et al. (2020), who suggested that students' task-specific selfefficacy can be affected by perceived task difficulty and not by assessments of their own capabilities.However, the current study does not support a positive effect of "easy" or "medium" markings of a task on students' selfefficacy, a result that diverges from previous research reporting that students' self-efficacy was higher when they were told that a task was easy compared to when they were told that a task was difficult (Jacobs et al., 1984).That being said, it is important to point out that the study carried out by Jacobs et al. (1984) did not have a control group, and their findings were also limited by the absence of a pretest.One interpretation of the results, which have implications for theory and practice in mathematics education, is that marking a task as "easy" is not necessary enough to increase students' self-efficacy.This is justified by the fact that students' previous experiences can influence their self-efficacy, and if they experience that tasks marked as "easy" are not always easy to solve, the students may become skeptical of easy marking and thus maintain their original self-efficacy.
In RQ 2 , considering students' baseline self-efficacy, we asked how the level marking of a mathematics task that is impossible to solve affect students' self-efficacy, which made us set forth two hypotheses.First, regarding students with low baseline self-efficacy, no significant change was found in their self-efficacy after being presented with the level marking of an impossible task, not even when the task was marked as difficult.From this, we infer those students with low self-efficacy in Set 1 considered the task difficult, and when the task was marked as difficult (in Set 2), their perception of the task as difficult did not change.In contrast, for students with high baseline selfefficacy, we found that marking the task as difficult resulted in significant reductions in self-efficacy.This finding is a new contribution to the research field, as the effect seems to be particularly prominent among students with high baseline self-efficacy.Moreover, we found that marking a task as "easy" or "medium" did not affect these students' self-efficacy.It is reasonable to conclude that students with high self-efficacy in Set 1 may have assessed the level of the task (without a level marking) as easy or moderately difficult, and when the tasks were marked as "easy" or "medium" (in Set 2), this did not affect their self-efficacy.Taken together, our results resonate well with the results of Collins (1984), who found that students with high self-efficacy rated tasks as significantly easier than students with low self-efficacy.

Level Marking of Mathematics Tasks and Students' Persistence
In RQ 3 , we asked how the level marking of a mathematics task that is impossible to solve affects students' persistence, and we found that the "difficult" marking had a negative effect.That is, when students were faced with a "difficult" marking, the time they spent trying to solve the task changed, a result that is consistent with those of Herset and El Ghami (2022), who found a negative effect of marking a hard (but solvable) task as difficult on the time students spent trying to solve it.However, as in the case of self-efficacy (RQ 1 ), the current study does not Published by IDEAS SPREAD support the effect of "easy" or "medium" markings on students' persistence.This means that there may be a connection between students' self-efficacy and the time spent on an impossible task.This is justified by previous research that found a positive relationship between self-efficacy and persistence (Bong & Skaalvik, 2003;Collins, 1984;Jacobs et al., 1984;Lyman et al., 1984;Multon et al., 1991;Schunk, 1991;Sexton & Tuckman, 1991;Skaalvik et al., 2015), which underlines how important self-efficacy might be on persistence.The findings we report here add more to the findings of Street et al. (2017), who suggested that students' perceived task difficulty affects self-efficacy.
In RQ 4 , considering students' baseline self-efficacy, we asked how the level marking of a mathematics task that is impossible to solve affect students' persistence, which made us set forth two hypotheses.First, regarding students with low baseline self-efficacy, no significant change was found in their persistence following encountering the level marking of an impossible task, not even when the task was marked as difficult.Therefore, we argue that the level markings for students with low baseline self-efficacy does not affect persistence.This may be because their opinions of the level of difficulty of the task were the same in Sets 1 and 2, regardless of the level marking.Another explanation could be that students with low baseline self-efficacy lacked mastery experience during the survey.When participants were asked to solve the last impossible task, they had already been given other solvable tasks to complete; thus, their mastery experience may have influenced their persistence.This is in good agreement with Sexton and Tuckman (1991), who found that at the beginning of solving a task, generalized self-efficacy and past performance were the best predictors of persistence, but after solving a few tasks, it was students' specific selfefficacy and how much time they had spent on the test previously that best predicted persistence.Furthermore, those with high baseline self-efficacy spent significantly less time trying to solve the task when the task was marked as difficult.The result fits well with Jacobs et al. (1984), who found that students who were told that the task was difficult reduced their self-efficacy and persistence compared to students' who were told the opposite.However, our findings suggest something new; this effect is particularly prominent among students with high baseline self-efficacy.

Limitations
We acknowledge the limitations of this study, which are mainly related to the lack of pretesting when investigating students' persistence.In addition, persistence was operationalized as the students' time spent solving an impossible task.We therefore do not know if the amount of time from receiving the task to when they pushed the "send" button was used to solve the task or if they were doing something else (e.g., some students may start to daydream).We therefore acknowledge potential alternative interpretations of the study, like for instance that the students may have felt the need to hurry because the school day was almost over.To overcome such limitations, we suggest observation of students' during the survey, to examine how students are spending their time when working with mathematics tasks with varying level markings.Moreover, interviews can provide insight into why students spend less time solving a task marked as difficult.This issue regarding the lack of knowledge about students' actions while solving the task was also raised by DiNapoli (2023) and is one of the reasons why we included a control group.Since the size of the sample is relatively large (n = 331) and the allocation to the CG and EGs was random, it is plausible that a random measurement error was mitigated.
Limitations in the data collection process include the sampling strategy, sample size, and the skewing in the distribution of participants.Because of COVID-19, we had to choose several schools in one district to receive enough data.Based on the table of sample sizes, a sample size of 383 students was recommended (Cohen et al., 2018).This means that our full sample was above the limit (n = 436), but unfortunately, due to the missing values, outliers (as described above), and the way in which the students were divided into subgroups, the sample size may be a limitation.

Table 1 .
Differences in Students' Self-Efficacy in Sets 1 and 2 for All Four Groups The decrease in self-efficacy for this group shows the importance of having a CG.There can be changes in the level of self-efficacy during a test (see, e.g.,

Table 2 .
Results of Three Independent Samples T-Tests on the Differences in Students' Self-Efficacy Note.*The mean difference is significant at the .05level (p < .05).**Levene's test for equality of variances

Table 3 .
Difference in Self-Efficacy in Sets 1 and 2 for Students with Low Baseline Self-Efficacy for All Four Groups

Table 4 .
Results of Three Independent Samples T-Tests on the Difference in Self-Efficacy for Students with Low

Table 5 .
Differences in Self-Efficacy in Relation to Different Level Markings for Students with High Baseline Self-Efficacy

Table 6 .
Results of Two Independent Samples T-Tests on Differences in Self-Efficacy for Students with High

Table 7 .
Results of One Mann-Whitney U Test on the Difference in Self-Efficacy for Students with High Baseline

Table 8 .
Results of Three Mann-Whitney U Tests on Students' Time Spent on Tasks Note.Med: Median.*The difference was significant at the .05level.

Table 9 .
Results of Three Mann-Whitney U Tests on Time Spent on Tasks among Students with Low Baseline Self-Efficacy

Table 10 .
Results of Three Mann-Whitney U Tests on Time Spent on Tasks among Students with High Baseline Med: Median.*The difference was significant at the .05level.Published by IDEAS SPREAD