Saturday, November 10, 2012

Needs Improvement: An Evaluation of CPS's REACH Students

With the new contract, CPS has rolled out a new evaluation program called REACH students. This program was developed by a committee formed of CPS people and Chicago Teachers Union people. The committee was required by the 2010 Illinois State Law called the Performance Evaluation Reform Act (PERA), a law that was passed as a part of Illinois's application for federal education money under Race to the Top. Race to the Top required states to include pilot programs for performance-based pay, so PERA was designed to pave the way for "real" evaluations so that those ratings could be used to differentiate teachers in order to reward "the best" teachers and fire "the worst" teachers. (Does this line sound familiar from the presidential campaign? It should, since President Obama says it all the time, except when he sort of fudges it and leaves out the firing part. But most research shows that it's really, really hard to evaluate teachers with the purpose of ranking them in this way. I think one stat I heard said that only about 5% of teachers can be considered "the best" year after year, and only about 5% can be considered "the worst." The other 90% fall in the middle.)

So, PERA also made some rules for how the new evaluation systems would be formed. Funny thing is, the Illinois Senate, when they crafted this law, made different rules for the Chicago Public School District than for all the other districts in Illinois. In the rest of Illinois, teachers unions and districts would negotiate over the system, and if they couldn't reach an agreement, the system would default to a generic one formulated by the Illinois State Board of Education (ISBE). In CPS, if CPS and CTU couldn't reach an agreement, then CTU would be required by state law to accept CPS's "last best offer."

CTU walked away from the table after 4 months of negotiations. Publicly and on paper, the strike was about wages and benefits. State law dictates that these are the things over which unions can call a work stoppage. In reality, and not-so-secretly, the strike was about many more things, and the evaluation system was at the top of the list. CTU ended up winning concessions for the evaluations: before the strike, 40% of teacher's rating was going to ultimately depend  on student performance measures, i.e., test scores and performance-based assessments (different, district-created tests that teachers grade themselves). After the strike, that proportion will max out at 30% in year 3 of the new contract.

What is the other 70%? Classroom practice, i.e., observations by "qualified evaluators." Only administrators can become qualified evaluators. Every non-tenured teacher must be formally observed 4 times per year (this is a big improvement from the previous systems 1-2 times), and each formal observation must follow a very strict procedure involving pre- and post-conferences and the gathering of evidence. All of this is to the good: it is much, much harder, under the new system, for an administrator to give a teacher an unsatisfactory rating without proof that the teacher's performance really is unsatisfactory, and the system is designed to be, in theory, more supportive of teacher development. Instead of one observation and rating per year, this system requires that teachers receive some sort of coaching so that their practice can improve.

The rubric that CPS has adopted is called "The CPS Framework for Teaching Adapted from the Danielson Framework for Teaching and Approved by Charlotte Danielson." I call it a rubric because that's what it is--a gigantic rubric. And it's a great rubric--it really does describe some best practices and is research-based. I'm very familiar with the Danielson rubric because AUSL has used it for years, beginning just before my residency year. So we were given time to study it and we used it to evaluate ourselves when we were residents. It was also used to rate us when we were residents--something that Charlotte Danielson reportedly said should not be done. But apparently now Danielson (or "Charlotte," as many of AUSL's leaders call her) is OK with that, since the CPS Framework for Teaching was approved-by-her.

Interestingly, Danielson's 4 ratings have retained their names--Unsatisfactory, Basic, Proficient, and Distinguished. But CPS's evaluation system gives the equivalent ratings different names: "Basic" is "Developing" and "Distinguished" is "Excellent." Both of these changes are revealing. When I was a resident, we were constantly told that "Basic" was where a first-year teacher could be expected to be most of the time. They also had this saying about Distinguished: "It's a nice place to visit, but don't expect to live there." Now, in CPS, it seems like we're likely to experience some grade inflation, if you will. Teachers who would normally get "Basic," the equivalent of a C, might now get "Proficient," the equivalent of a B. When I taught college, I often gave Bs to students I thought deserved Cs in order to avoid the time suck of debating with students over their grades. I imagine that principals will feel the same way. Meanwhile, where "Distinguished" was once reserved for only award-winning teachers, it will now be given to any teacher deemed "Excellent." Now, I know that CPS has always pressured administrators to be very stingy with the "Excellent" rating, so this might not be as much of a problem. But when they try to reintroduce performance pay in the next contract, we'll see which teachers start angling for that Excellent rating.

From the union side, the change of "Basic" to "Developing" (a euphemism that the CTU won instead of having to use PERA's label, "Needs Improvement") creates an upward push to categorize teachers as "Proficient" who might not actually be proficient on Danielson's scale. CPS wanted to say that a teacher who earned a "developing" rating for two consecutive years would automatically be rated "unsatisfactory." This is how the CTU could say that the new system was putting thousands of teachers at risk for dismissal. In the final contract, two "developing" ratings will only turn into an "unsatisfactory" if you don't, in so many words, actually "develop."

In sum, not counting the use of test scores, which I will repeat are, at present, an unreliable indicator of teacher proficiency or student growth, the new evaluation system does a lot more to protect teachers.

But now let's look at implementation. Let me preface this with a BIG CAVEAT: What I'm writing below is not intended in any way to impugn the administrators at my school or any of the other CPS administrators who I know well. In fact, I think it will show the ways in which CPS administrators have had their hands tied by CPS even worse than teachers. After all, administrators do not have a union. They are the only people who work in a CPS building who are not in a union. Thus, they, like teachers, are forced to implement all kinds of policies that have not been well thought-through or are only, as I say below, in the "rudimentary" stages. Their job is tough. My beef is not with them. It is with CPS.

First, let's see how Charlotte Danielson describes a "Proficient" assessment system (Domain 1e):
(1) Teacher’s plan for student assessment is aligned with the standards-based learning objectives identified for the unit and lesson; (2) assessment methodologies may have been adapted for groups of students. (3) Assessments clearly identify and describe student expectations and provide descriptors for each level of performance. (4) Teacher selects and designs formative assessments that measure student learning and/or growth. (5) Teacher uses prior assessment results to design units and lessons that target groups of students.
I've numbered each of the sentences so that we can evaluate CPS's evaluation system, one at a time, and, for the sake of this argument, I've replaced the word "teacher" with "CPS" and "student" with "teachers."
(1) CPS's plan for student assessment is aligned with the standards-based learning objectives identified for the unit and lesson
This is somewhat true. If the objective is to produce "excellent" teachers, then CPS has identified a standard (Danielson's Framework) and they're using an assessment (Danielson's Framework) that is aligned with the standard. So I would give CPS a P (Proficient) in this element.
(2) assessment methodologies may have been adapted for groups of teachers
Again, this is somewhat true. Probationary Appointed Teachers (PATs or untenured teachers) are being observed 4 times, and tenured teachers will be observed once (a minimum) or twice (at least 50% of tenured teachers in a building must be observed twice). The framework for those teachers, however, is the same, and the point scales used to determine final ratings are the same. The descriptors for "Basic" and "Unsatisfactory" don't say anything about differentiating assessment for different groups, but I think I have to give CPS a B (Basic) in this element.
(3) Assessments clearly identify and describe student expectations and provide descriptors for each level of performance.
Expectations are certainly clearly identified--check. But many teachers in the district began this year with little or no familiarity with the Danielson Framework. We had our first PD about Danielson yesterday, after the first round of observations already took place. (It was a very good PD, because our administrators, unlike some, want us to be successful. Our administrators also have lots of familiarity with Danielson already because they have been in AUSL schools for years.) So I would give CPS a P- (Proficient-minus) here. The expectations are clearly described, but they are long and complex and have not been taught to us. We'll come back to that when we get to Component 3d.
(4) CPS selects and designs formative assessments that measure teacher learning and/or growth. 
The answer to this one is yes and no. We have four observations, and we don't get our "summative" final rating until the end of the year. But we still don't know (and our admin doesn't, either) how the four observations are being used to determine the final rating. Will they be averaged? Will growth be taken into account? We don't yet have enough information to say. I haven't yet received my grades from my first observation, so I don't know yet whether they are designed to reflect growth. Let's look at what Danielson says for "basic" in this part of the component:
Teacher’s approach to the use of formative assessment is rudimentary, only partially measuring student learning or growth.
Since this is the first year of the evaluation system, I might actually go along with a word like "rudimentary." The electronic system that admin will be using to deliver our scores and that we'll be using to see our scores is not yet up and running. So CPS gets a B here. One more:
(5) CPS uses prior assessment results to design units and lessons that target groups of teachers. 
Again, not sure about this one yet. I'm supposed to get my first scores on Tuesday, so I suppose I'll find out then what kinds of supports I'll be getting to improve. I feel bad for my administration here, though, because we don't have very much professional development time in our calendar at all this year. Will they differentiate instruction for teachers with different ratings? It remains to be seen. So we'll give CPS an N/A here. Not enough information available yet.

Overall rating for component 1e: B+ (Basic-plus. I averaged the scores using the numbers 1-4 for the ratings and got an average of 2.375.)

OK, phew. That took over an hour to write. Now let's look at the use of assessment in the Instruction domain (Domain 3). The relevant part is component 3d, "Using Assessment in Instruction." Here's the language for a Proficient rating:
(1) Teacher regularly uses formative assessment during instruction to monitor student progress and to check for understanding of student learning. (2) CPS uses questions/prompts/assessments for evidence of learning. (3) Students can explain the criteria by which their work will be assessed; some of them engage in self-assessment. (4) Teacher provides accurate and specific feedback to individual students that advance learning.
OK, let's take these one at a time. Again, I'm going to change the word "teacher" to "CPS" and the word "students" to "teachers."
(1) CPS regularly uses formative assessment during instruction to monitor teacher progress and to check for understanding of teacher learning.
Once again, the system requires 4 formal observations. There can also be any number of informal observations, i.e., "spot checks," where the evaluator can walk into your room at any time. For an informal observation, you're supposed to receive feedback. Every knows that working in an AUSL school turns your classroom into a fishbowl. So yes, I've had people walk in a lot. But I don't think any of those walk-ins have been official "informal observations," because I haven't gotten much feedback about them. (To be clear, this is not a criticism of my administrators!! They are doing their utmost with a system that was not designed by them.) CPS also says that teachers can opt to have their first of the four observations be treated as a "practice." So that means that you can use your feedback from that one to grow. If you take the first one as a practice, then there is a required end-of-year informal observation (at least, I think it comes after all four formals). So this piece remains to be seen, but as it's written in the REACH system, it's not clear what is "formative" and what is "summative." But it does seem like the observations are going to used to monitor progress. So CPS gets a P.
(2) CPS uses questions/prompts/assessments for evidence of learning.
Yes, this works. In our post-conference, we have a form with questions. And administrators have to provide evidence for our ratings. Another P.
(3) Students can explain the criteria by which their work will be assessed; some of them engage in self-assessment.
My students have this text lingo word that is spelled skuuuuuuuuuurrrrrr! It is the sound of a car putting on its brakes and means "back it up!" Teachers had a preliminary "training" about REACH during our week-long institute at the beginning of the school year where we were read a script and asked to sign a piece of paper that said we understood the script. Our next PD about REACH was yesterday. So, in the first quarter of the year, we've received about 2.5 total hours of PD about the evaluation system. People are confused. And in my building the majority of teachers were already intimately familiar with Danielson's Framework. And only 1 hour of the PD we've received was scripted by CPS--the other 2 hours were designed and given by our admin in order to support us. So I really can't imagine how teachers who have never seen Danielson before must be feeling. Plus, we still don't have a clue what the score ranges in our contract mean in real life. The language for "Unsatisfactory" sounds more appropriate to my empirical observations and my guesses based on how much PD has been required by CPS: "[Teachers] cannot explain the criteria by which their work will be assessed and do not engage in self-assessment." I'm an easy grader, so I'll give CPS a B- (Basic-minus) on this one, since some teachers in AUSL schools can describe the criteria and know how to self-assess. Next sentence, please.
(4) CPS provides accurate and specific feedback to individual teachers that advance learning.
OK, CPS created this reportedly awesome online system that evaluators can use to upload their observation data and provide feedback to teachers. I say "reportedly" because IT IS NOT ONLINE YET. It is only in its pilot stage. But most schools have already completed their first round of observations. While we are waiting for the system to go live, we're being given pencil-and-paper grades (which are sitting in my school mailbox as I write) and we're supposed to email CPS if we have a question. The language in Danielson for Unsatisfactory says "[CPS]’s feedback is absent or of poor quality." Now, remember my caveat! This is not a criticism of my administration. They are being asked to use a system that is not yet up-and-running to report our assessment results. How messed up is that? I'll give CPS a U+ on this one.

So, CPS's average score for component 3d is, at 2.19, is Basic.

Wow, that took me several hours to write. And I was only looking at 2 components out of a total of 19. My school has more than 50 non-tenured teachers, at 4 times a year, for 3 administrators. They have A TON OF WORK TO DO. Did anyone think through these logistics?

Two domains, two scores of Basic. Sounds like CPS Needs Improvement. Let's hope they have this up and running by next year, or they might get canned. Given their scores in the past in Component 4a, Reflecting on Teaching and Learning, I'm not optimistic.