[We would like to thank Paul Jakus (@paulj) of the Dept. of Applied Economics at Utah State University for this summary of research presented at the 2024 Phish Studies Conference. -Ed.]
As many of you know, the coding/architecture of Phish.Net is currently undergoing a major overall. While the ability to rate shows has been restored, we continue to study the ratings database with the goal of improving the accuracy (and credibility) of show ratings. This is the first in a series of four blogposts about that effort.
This post will focus the raw ratings data.
All analysis is based on an anonymized database downloaded on October 26, 2023.
This date allows us to sidestep problems associated with possible ratings shenanigans in the aftermath of the NYE Gamehendge performance. Ratings for 592 dates that were soundchecks, TV appearances, side projects, false dates (on which no show was performed), and shows for which there is no surviving audiotape were deleted. The final data consist of 343,241 ratings from 16,452 users for 1,736 shows. The shows ranged from Phish’s first, December 2, 1983, through October 15, 2023.
Show ratings are tightly concentrated at high values.
A smoothed distribution for show ratings (the simple average of individual ratings) appears below. Ratings have a heavy concentration of shows packed in between 4.0 and 4.8, and a long tail of relatively fewer shows spread across the relatively low ratings (<4.0).
· The mean show rating was 3.852 (median = 3.920).
· Some 806 of the 1,736 shows (46.4%) are rated as 4.0 or higher.
· If ranked in order, the difference in rating between the tenth highest-ranked show (Fukuoka 6/14/00, 4.636) and the show ranked #106 (Alpharetta 8/3/18, 4.539) is less than 0.1 points.
· The next difference of 0.1 points takes us all the way down to show #262.
Individual Rater Data Are Also Highly Concentrated
Every time you rate a show, it generates a new entry in the dataset consisting of your user name, the show date, show rating, and time of rating (to the nearest minute). The histogram below depicts the total number of show ratings, by rating value. Fewer than 10% of all show ratings were a ‘1’ or a ‘2’, whereas over 90% were a ‘3’, ‘4’, or ‘5’. The highly-skewed distribution of individual ratings is what drives the skewed distribution of mean show ratings.
Our 16,452 raters constitute represent a fraction of all registered Phish.Net users. Among this group,
· The average number of shows rated was 21 (median = 4.)
· Roughly 80% of all ratings were provided by about 3,300 people. This means that 17% of all raters, and only about 4% of all registered Phish.Net users, provide most of the ratings.
The individual data also reveal interesting patterns; the table below reports descriptive statistics of the individual ratings, by numeric value.
· Most people have never rated a show as a ‘1’, ‘2’, or ‘3’.
· The average person has rated about one show as a ‘1’ or a ‘2’, about three shows as a ‘3’, six shows as a ‘4’, and 10 shows as a ‘5’.
· The most ‘1s’ by a single rater (Rater A) was 393, while the most ‘5s’ by a single rater (Rater B) was 947.
Before jumping all over Rater A, let's learn a bit more about their full ratings profile. It turns out that Rater A has rated 939 performances and has used all five possible ratings, including nearly 300 shows rated as a ‘4’ or a ‘5’. On the other hand, Rater B has rated a total of 948 shows, 947 as ‘5’ and one as a ‘4’. Which of these Raters, A or B, discriminates among Phish performances?
Raters A and B are only two (extreme) examples from the 16,452 people in the database, but they help illuminate issues regarding the “bombers & fluffers” debate. The next post [link?] will dig a bit deeper into different rating behaviors, and how those behaviors may affect the accuracy of show ratings.
If you liked this blog post, one way you could "like" it is to make a donation to The Mockingbird Foundation, the sponsor of Phish.net. Support music education for children, and you just might change the world.
You must be logged in to post a comment.
Phish.net is a non-commercial project run by Phish fans and for Phish fans under the auspices of the all-volunteer, non-profit Mockingbird Foundation.
This project serves to compile, preserve, and protect encyclopedic information about Phish and their music.
Credits | Terms Of Use | Legal | DMCA
The Mockingbird Foundation is a non-profit organization founded by Phish fans in 1996 to generate charitable proceeds from the Phish community.
And since we're entirely volunteer – with no office, salaries, or paid staff – administrative costs are less than 2% of revenues! So far, we've distributed over $2 million to support music education for children – hundreds of grants in all 50 states, with more on the way.
The short answer is, "I don't know." The analysis was based on an anonymized dataset for which I've obtained IRB approval from my university, so I'm trying to follow proper research protocol. I've done Phish stats for several years now--mostly using data pulled from publicly available data, and I was pleased that this work allowed the site Admins to trust me with the raw data.
That said, if you have econometric questions, send them to me via PM. The variables are (anonymized) User ID, show date, rating, and time of rating.
@greenarrow74:
Oh, man, I'd love to control for attendance bias! That's on my list of of things to ask the coders about. Last fall I proposed a split sample test of the "herding effect" during the NYE run (herding will be defined in Post #3), but it was simply going to be too much, too soon.
@grizzwold:
You do You, man.
Keep the comments coming
I'd also motion for the ability to exclude based on certain criteria. Like for me personally, I would want to exclude any user who has used more than X amount of 0s, or user who's average show rating is below 2, etc.
I don't have as much free time these days so it's nice to be able to get to the end of a tour and look at the Top Rated shows of the tour/year and start with those when I get a hankering for listening to some new phish
And I also don't know that I ever would have gotten as hooked as I did without the all-time Top rated list -- sure, the ranking system will never be perfect for everyone and there will be some bad actors, but I don't think anyone would try and tell me with a straight face that the top 50 shows isn't a list of verified barn-burners, even if their personal favorite is excluded from that list or they think some of them are overrated.
I'm so grateful for all the folks who have contributed to this project. Being a data nerd and music nerd, the song histories, ratings, forums, stats, etc have all genuinely changed my life for the better.
Second, have you all considered instead of a 1-5 or 1-10 scale where reviewers have to choose a whole number, offering a scale (and choices for reviewers to select) that allows for more nuance within that most commonly hit zone of 3-5? Most simple might be offering these choices: 1, 2, 3, 3.5, 4, 4.5, 5. Lots of different ways you could go at it, but something along these lines would allow careful reviewers a chance to give more measured/nuanced input which may end up with more aggregate ratings more aligned with actual sentiment. Just a thought!
Agree with others above that a wider range of rating options would help differentiate things better. Whether its 1-10 scale or 1-5 with half-point increments. This would help break up some of that log jam of shows all clustered around the 4.0 mark. Will help to better sort out the truly-great from the average-great shows.
Also would be way more complicated and not sure how it would work but would be interesting to see how things skew based on how raters consume the shows: live in person, coach tour w/ video stream, LivePhish/SiriusXM soundboard, or audience recording. With attendance bias, would a show that got a 4.5 based on live in person voters be as "good" as a show that got a 4.5 from people experiencing just the soundboard audio?
Grade on a curve.
1997-12-07 and 2024-08-17 don't deserve to be in the same breath.
Phish.net should publically list WaxBanks and NOOB100 reviews
Long live '97-2000 when Phish was the baddest touring act on the planet.
Downvote away.