Last time I reported on our results from the group classes at 626, I discussed an issue that came out of the way we tabulated the statistics. Just as I developed a system to correct that problem, I decided to revamp our entire testing regimen.
We were testing every 6-weeks, which I actually think is a solid frequency for beginner and intermediate athletes. The problem… well not so much a problem I guess, but I think people appreicate a little more variety in their testing. And as much as I love the frequency and consistency of data, I need to be cognizant that the gym isn’t 100% laboratory environment :).
At our previous testing frequency, test week shows up in the program just over 8 times per year. When phrased that way, it really doesn’t seem that often, but for those participating… well, I got the feeling some might be getting a little bored.
I decided to use the situation as an opportunity. Instead of reducing the number of testing weeks each year, I’m keeping the testing frequency fixed at 6-week intervals. However, the contents of the testing week will now be varied. This means more variety for members and more data for me!
Under the new system, we will alternate between two variations in testing.
Test Week A
Theme | Test |
---|---|
Upper Body Push | 1 RM Close Grip Bench Press |
1 RM Shoulder Press | |
1RM Split Jerk | |
Upper Body Pull | 1 RM Weighted Pull Up |
Core | Tabata Sit-ups |
Side Bridge Hold | |
Single Leg | 8 RM Split Squat |
Bending | Sorensen Hold |
Work Capacity | 30 sec Air Dyne for Cals |
400m Run Time Trial | |
Grace | |
2k Row Time Trial | |
Cindy |
Test Week B
Theme | Test |
---|---|
Double Leg | 1 RM Back Squat |
1RM Front Squat | |
Bending | 1 RM Deadlift |
1 RM Power Clean | |
1 RM Power Snatch | |
Upper Body Push | 1 RM Weighted Dip |
Work Capacity | 30 sec Row for Cals |
500 m Row Time Trial | |
Fran | |
Helen |
Win-win, people aren’t bored and I have more data points to evaluate program effectiveness and balance of fitness across our membership.
Testing Split and New Additions
There is a method to the division in the tests that are split between Week A and Week B. For instance, you might think it’d be better to test Back Squat in one of the weeks and Front Squat in another.
However, I’ve got those tests in the same test week because I like to see a balance between the two lifts at a specific percentage. If I measure them at two different instances separated by 6 weeks, it will skew the results. So if you see something you consider odd in the test grouping, this is likely the rationale.
Another change I made was in modifying the work capacity tests. Previously, all of the tests were workouts that I designed, with the intent to target specific energy systems for the largest cross section of our membership. Utilizing these workouts as our tests, and considering my philosophy on training vs. testing, our members never got the opportunity to do some of the standard CrossFit benchmarks.

Couples that test together, stay together!
This isn’t necessarily a problem, but I do feel that some of the early benchmark workouts are fantastic tests of fitness. I also think hitting those benchmarks adds to some of the fun around participating in CrossFit. I can remember comparing my results on benchmarks with friends at other gyms or even some of the top athletes (when I wanted to feel badly about myself). So I’ve gone back and added in a handful of these workouts which you’ll notice in the tables above.
These benchmarks weren’t selected at random, and we have specific guidelines to use when doing them on testing day – for instance, there aren’t to be any 20 minute Fran times, we scale the workout to achieve the high turnover that is so magical about Fran and test the appropriate energy system.
Results
With all of the outlined changes, the data available for reporting today is greatly reduced. The overlap between Test Week A and the old test week consists only of: Sorensen, Weighted Pull Up, Tabata Sit ups, Side Bridge and 400m Run.
Further reducing the data is that, as mentioned in the last testing post, I’m only considering data points from athletes that participated in the current and the testing week 6-weeks prior. In some cases, as you’ll see below, the sample size is quite small.
Theme | Test | July 2014 | August 2014 | Sample Size |
---|---|---|---|---|
Bending | Sorensen Max Hold | 1:44 | 1:43 | 25 |
Core | Side Bridge Hold | 1:27 | 1:26 | 12 |
Tabata Sit Ups | 9.3 | 9.6 | 9 | |
Upper Body Pulling | Pull up 1RM (male) | 31.8 | 36.8 | 6 |
Pull up * 3 (male) | 95.3 | 110.4 | 6 | |
Pull up 1RM (female) | 2.5 | 8.75 | 2 | |
Pull up * 5 (female) | 12.5 | 43.75 | 2 | |
% Unloaded | 11% | 11% | 9 | |
Work Capacity | 400 m Run | 1:18 | 1:20 | 11 |
What do we notice here? Well, even though they are small changes, the Sorensen, Side Bridge and 400 m Run results are all headed in the wrong direction. What’s the deal? In the case of all the tests other than Sorensen, the sample size is tiny, which makes interpretation of the data that much more susceptible to outliers.
For instance, 400 m run went up by two seconds. When I reviewed the data I saw that most individuals improved modestly, but that one individual completed the run substantially slower, which ultimately lead to the adverse change in the average.

Don’t trip… you’ll screw up the results!!!
Even for Sorensen, a test where we had a large sample size, most people saw improvements, but two individuals performed substantially worse than the previous test – like off by over a minute each, which again ultimately torpedoed the average.
How can we correct for this? Well, I think it’s going to require beefing up the statistical analysis, which I plan to experiment with over the coming months. I would also like to see sample sizes increase, but I think they were extraordinarily small this time around due to the shake up in the testing regimen.
Summary
Once again, we aren’t in a position to leverage these results to influence training, but we are learning as we go and I believe, inching our way closer to something useful.
I didn’t expect this to be a smooth process, which is good because it hasn’t been. I also wasn’t sure if we’d be able to actually gather useful data due to all the confounders associated with group fitness training – and I think the jury is still out on that one too.
Regardless, I believe there is a lot to learn on this journey and hopefully some of you out there find it interesting.