[we'd like to thank Prof. Paul Jakus, @paulj, for yet another thought-provoking statistical analysis of Phish.net data - ed.]
Phish.net show ratings are meant to convey Phish fans’ collective perception of how good a show was, but these ratings are subject to a number of biases. For example, .net ratings do not come from a random sample (sampling bias), and people tend to rate the shows they’ve attended quite highly (attendance bias).
Another possible bias, which the .net Cognoscenti have termed “Recency Bias”, is the tendency to rate a show during the first few days after the performance, if not immediately after the show. It is believed that ratings posted in the immediate aftermath of a concert will reflect the warm glow of that experience. People have not taken the time to reflect on the quality of that show relative to the performances immediately before or after, or within the context of an entire Phish tour. Recency bias implies that a show’s rating will decline as its warm glow dissipates.
It occurred to me that I could estimate the magnitude of recency bias using a Phish show database I’ve periodically updated since Summer 2018. We’ll look solely at the 21-show Summer 2018 tour, which started at Lake Tahoe on July 17 and ended at Dick’s on September 2. For each show, we can use snapshots of .net ratings taken on October 2, 2018, on May 5, 2019, and on April 2, 2020. Thus, we have ratings taken one month after the conclusion of tour, 8 months after tour, and 19 months after tour.
Here are the ratings time paths of three Summer 2018 shows [Gorge Night 3 (7/22/18), Bill Graham Civic Auditorium Night 2 (7/25/18), and The Forum Night 1 (7/27)]:
Gorge3 and Forum1 both show slightly declining ratings over time, while BGCA2 shows a slight uptick. Gorge3 fell by 0.118 points, as 95 new ratings came in between October 2018 and April 2020. In contrast, over this same time period, ratings by 34 new people pulled the BGCA2 rating up by 0.058 points—so immediate ratings might not always be “too high”.
On average, Phish.net Summer 2018 show ratings were about 0.051 points lower in April 2020 than they had been in October 2018. This sort of observation—declining ratings over time—is why people were thinking about recency bias. However, this simple difference doesn’t measure the bias because the April 2020 rating includes the contributions of both those who rated while still in the warm glow of tour (rating while “hot”) and the “cooler heads”—those who waited until well after the tour had concluded.
Fortunately, we can use some simple algebra to extract the implicit average show rating for the cooler heads: multiply the mean rating by the number of raters for April 2020 and again for October 2018, take the difference, and then divide by the number of new raters. (This approach assumes that no one who rated a show before October 2018 went back and changed their rating.) The “cool” ratings are based on anywhere from 29 to 164 new raters for a given show (mean=80) so the sample sizes are reasonable for this calculation. This is what we get:
Tahoe1, Gorge3, and Forum1, in particular, did not fare as well when cooler heads prevailed, with drops of 0.5 points or more. However, BGCA2 was more than 0.5 points higher than it was in October 2018.
The average Summer 2018 show rating, as measured in October 2018, was 4.065; when measured using only the cooler heads, the average show rating is 3.768. This implies a mean recency bias of almost 0.3 points (7.3%). Some 57% of Summer 2018 shows were rated at 4.0 or greater using the hot ratings whereas only 38% of shows exceeded 4.0 using the cool ratings.
The comparison of the hot versus cool show ratings shakes up the Top Five Summer 2018 shows rather thoroughly:
Ranking |
Hot Ratings |
October 2018 Raters/New Raters |
Cool Ratings |
October 2018 Raters/New Raters |
1 |
Dick’s 1 |
521/164 |
Alpharetta 3 |
397/95 |
2 |
MPP 2 |
515/153 |
Alpharetta 2 |
367/78 |
3 |
Alpharetta 1 |
505/147 |
BGCA 1 |
315/48 |
4 |
Gorge 3 |
470/95 |
BGCA 2 |
295/34 |
5 |
Alpharetta 3 |
397/95 |
Alpharetta 1 |
505/147 |
“New Raters” = # of raters between October 2018 and April 2020
It would have been nice if measured bias had been relatively constant (small variance) because then it could be ignored. The graph below, which simply repackages the data used in the bar graph above, shows that recency bias is not particularly stable.
Why do we observe a large bias for some shows and a smaller bias for other shows? Well, I tried running a few statistical models to explain the difference—controlling for free webcasts, the number of new raters, the initial rating, etc.—but nothing was obvious. If you have any suggestions as to why we see this pattern, let me know…
If you liked this blog post, one way you could "like" it is to make a donation to The Mockingbird Foundation, the sponsor of Phish.net. Support music education for children, and you just might change the world.
You must be logged in to post a comment.
Phish.net is a non-commercial project run by Phish fans and for Phish fans under the auspices of the all-volunteer, non-profit Mockingbird Foundation.
This project serves to compile, preserve, and protect encyclopedic information about Phish and their music.
Credits | Terms Of Use | Legal | DMCA
The Mockingbird Foundation is a non-profit organization founded by Phish fans in 1996 to generate charitable proceeds from the Phish community.
And since we're entirely volunteer – with no office, salaries, or paid staff – administrative costs are less than 2% of revenues! So far, we've distributed over $2 million to support music education for children – hundreds of grants in all 50 states, with more on the way.
So maybe go by percentage of votes for each year. This will reflect the percentage of votes of people who were fans at the time. So if Phïsh just keeps going and the audience grows and grows, going by percents of votes, instead of number of votes, year by year, can accommodate the growing audience and provide ratings that are reflective of the audience size voting. This will also offer some clarity for past years as well.
My theory would be that setlists are the biggest contributing factor to bias in the initial ratings. Some shows with strong setlists may also just be all around great shows, in which case their rating may not change much. You see this with the Alpharetta 2018 shows for example. This theory may not be entirely possible to test without being somewhat subjective, but based on my own personal experience I think setlist can have a huge impact in the way a show is viewed upon first impression.
That all being said, I have no idea why Alpha 1 dropped....that show kicked ass on paper and on relisten. Though it is still the top ranked show of 2018 summer tour, which it should be!
I just want to point out that I do go back and change ratings sometimes, but maybe i am an outlier.
In particular, after the Bakers Dozen, I reexamined how I rate shows, resulting in a stricter system that required lowering ratings on 2016 shows.
Somebody that listens to, say, a couple songs from MPP Tweezerfest and thinks, "Meh, doesn't have a 28+ minute jam and don't care much for Waiting All Night" or someone who looks at Charlotte 2019 setlist and song lengths and thinks "sucks they close first sets now with new songs" - to me - is less informed and useful than someone who attended.
@SpinningPiper said:
Check out Phish's first jump into Virtual Reality gaming:
https://youtu.be/a0PAHW48NaM