Phish.Net Show Ratings, Part 1: An Introduction to the Ratings Database

Thursday 08/22/2024 by phishnet

PHISH.NET SHOW RATINGS, PART 1: AN INTRODUCTION TO THE RATINGS DATABASE

[We would like to thank Paul Jakus (@paulj) of the Dept. of Applied Economics at Utah State University for this summary of research presented at the 2024 Phish Studies Conference. -Ed.]

As many of you know, the coding/architecture of Phish.Net is currently undergoing a major overall. While the ability to rate shows has been restored, we continue to study the ratings database with the goal of improving the accuracy (and credibility) of show ratings. This is the first in a series of four blogposts about that effort.

This post will focus the raw ratings data.

All analysis is based on an anonymized database downloaded on October 26, 2023.

This date allows us to sidestep problems associated with possible ratings shenanigans in the aftermath of the NYE Gamehendge performance. Ratings for 592 dates that were soundchecks, TV appearances, side projects, false dates (on which no show was performed), and shows for which there is no surviving audiotape were deleted. The final data consist of 343,241 ratings from 16,452 users for 1,736 shows. The shows ranged from Phish’s first, December 2, 1983, through October 15, 2023.

Show ratings are tightly concentrated at high values.

A smoothed distribution for show ratings (the simple average of individual ratings) appears below. Ratings have a heavy concentration of shows packed in between 4.0 and 4.8, and a long tail of relatively fewer shows spread across the relatively low ratings (<4.0).

· The mean show rating was 3.852 (median = 3.920).

· Some 806 of the 1,736 shows (46.4%) are rated as 4.0 or higher.

· If ranked in order, the difference in rating between the tenth highest-ranked show (Fukuoka 6/14/00, 4.636) and the show ranked #106 (Alpharetta 8/3/18, 4.539) is less than 0.1 points.

· The next difference of 0.1 points takes us all the way down to show #262.

Individual Rater Data Are Also Highly Concentrated

Every time you rate a show, it generates a new entry in the dataset consisting of your user name, the show date, show rating, and time of rating (to the nearest minute). The histogram below depicts the total number of show ratings, by rating value. Fewer than 10% of all show ratings were a ‘1’ or a ‘2’, whereas over 90% were a ‘3’, ‘4’, or ‘5’. The highly-skewed distribution of individual ratings is what drives the skewed distribution of mean show ratings.

Our 16,452 raters constitute represent a fraction of all registered Phish.Net users. Among this group,

· The average number of shows rated was 21 (median = 4.)

· Roughly 80% of all ratings were provided by about 3,300 people. This means that 17% of all raters, and only about 4% of all registered Phish.Net users, provide most of the ratings.

The individual data also reveal interesting patterns; the table below reports descriptive statistics of the individual ratings, by numeric value.

· Most people have never rated a show as a ‘1’, ‘2’, or ‘3’.

· The average person has rated about one show as a ‘1’ or a ‘2’, about three shows as a ‘3’, six shows as a ‘4’, and 10 shows as a ‘5’.

· The most ‘1s’ by a single rater (Rater A) was 393, while the most ‘5s’ by a single rater (Rater B) was 947.

Before jumping all over Rater A, let's learn a bit more about their full ratings profile. It turns out that Rater A has rated 939 performances and has used all five possible ratings, including nearly 300 shows rated as a ‘4’ or a ‘5’. On the other hand, Rater B has rated a total of 948 shows, 947 as ‘5’ and one as a ‘4’. Which of these Raters, A or B, discriminates among Phish performances?

Raters A and B are only two (extreme) examples from the 16,452 people in the database, but they help illuminate issues regarding the “bombers & fluffers” debate. The next post [link?] will dig a bit deeper into different rating behaviors, and how those behaviors may affect the accuracy of show ratings.

If you liked this blog post, one way you could "like" it is to make a donation to The Mockingbird Foundation, the sponsor of Phish.net. Support music education for children, and you just might change the world.

14 comments - Link: http://phi.sh/b/66c60bad

Comments

2024-08-22 8:11 am, comment by dipped

Any chance the database will be made available for individuals to perform their own econometric analysis?

Score: 4

2024-08-22 8:37 am, comment by Greenarrow74

Interesting stuff and thanks for sharing. One thing I've always felt would be important to distinguish is whether or not a rater was in attendance at a show. We all inherently feel there is attendance bias...it would be great to see some data on that. Can you show us two ratings? 1 of those who attended and 1 of those who hadn't?

Score: 4

2024-08-22 11:42 am, comment by Grizzwold

Every show I went to was a 5 and all the other ones are 3s at best. Am I doing it right?

Score: 5

2024-08-22 12:19 pm, comment by paulj

@dipped:

The short answer is, "I don't know." The analysis was based on an anonymized dataset for which I've obtained IRB approval from my university, so I'm trying to follow proper research protocol. I've done Phish stats for several years now--mostly using data pulled from publicly available data, and I was pleased that this work allowed the site Admins to trust me with the raw data.

That said, if you have econometric questions, send them to me via PM. The variables are (anonymized) User ID, show date, rating, and time of rating.

@greenarrow74:
Oh, man, I'd love to control for attendance bias! That's on my list of of things to ask the coders about. Last fall I proposed a split sample test of the "herding effect" during the NYE run (herding will be defined in Post #3), but it was simply going to be too much, too soon.

@grizzwold:

You do You, man.

Keep the comments coming

Score: 5

2024-08-22 2:19 pm, comment by fukuoka_gumbo

Would you ever consider changing the scale to a 10 point scale and changing the existing 5s to 10s, 4s to 8s, etc?

I'd also motion for the ability to exclude based on certain criteria. Like for me personally, I would want to exclude any user who has used more than X amount of 0s, or user who's average show rating is below 2, etc.

I don't have as much free time these days so it's nice to be able to get to the end of a tour and look at the Top Rated shows of the tour/year and start with those when I get a hankering for listening to some new phish

And I also don't know that I ever would have gotten as hooked as I did without the all-time Top rated list -- sure, the ranking system will never be perfect for everyone and there will be some bad actors, but I don't think anyone would try and tell me with a straight face that the top 50 shows isn't a list of verified barn-burners, even if their personal favorite is excluded from that list or they think some of them are overrated.

I'm so grateful for all the folks who have contributed to this project. Being a data nerd and music nerd, the song histories, ratings, forums, stats, etc have all genuinely changed my life for the better.

Score: 4

2024-08-22 2:51 pm, comment by FrontMan

First, I appreciate all you guys and gals do to support the Phish community and make it even more fun with Phish.net. It is a beloved resource.
Second, have you all considered instead of a 1-5 or 1-10 scale where reviewers have to choose a whole number, offering a scale (and choices for reviewers to select) that allows for more nuance within that most commonly hit zone of 3-5? Most simple might be offering these choices: 1, 2, 3, 3.5, 4, 4.5, 5. Lots of different ways you could go at it, but something along these lines would allow careful reviewers a chance to give more measured/nuanced input which may end up with more aggregate ratings more aligned with actual sentiment. Just a thought!

Score: 2

2024-08-22 4:58 pm, comment by johnmd750

Thanks for sharing this. Interesting stuff.

Agree with others above that a wider range of rating options would help differentiate things better. Whether its 1-10 scale or 1-5 with half-point increments. This would help break up some of that log jam of shows all clustered around the 4.0 mark. Will help to better sort out the truly-great from the average-great shows.

Also would be way more complicated and not sure how it would work but would be interesting to see how things skew based on how raters consume the shows: live in person, coach tour w/ video stream, LivePhish/SiriusXM soundboard, or audience recording. With attendance bias, would a show that got a 4.5 based on live in person voters be as "good" as a show that got a 4.5 from people experiencing just the soundboard audio?

Score: 0

2024-08-22 5:47 pm, comment by digunderrocks

I used to run panels at the National Science Foundation, which also rates on a scale of 1-5, and the most commonly used expression was "USE THE RANGE"!

Score: 1

2024-08-22 6:12 pm, comment by nole095

Why not separate top show page by era? Phish 1.0 , 2.0 , 3.0 , etc.

Score: 2

2024-08-22 7:46 pm, comment by Phunkaddict

First, thanks so much for running this site, it's such an awesome thing for us phans! So grateful, truly. As for ratings, I have often thought, like Fukuoka Gumbo , that a 10 point rating system would be helpful in providing more accurate and nuanced ratings. I think part of the reason people like Rater B may have so many 5s, and why some people (like me) haven't rated shows at all, is that so many shows feel better than 4/5 but not quite worthy of a perfect 5/5, but giving them a 4 seems a disservice. There are many that feel like a 4.5/5 or 9/10, or maybe 7/10 for a solid but not great show. Most phans I know feel that Phish is consistently very good, and it's going to be skewed toward higher scores. We love to have friendly debates about why Night 2 was better than Night 3 and how both were better than N1 and N4, but how those were also really good. This is easier to translate to numerical ratings if we can give them 8, 8, 9, & 10, for example. I'm sure that would take more work on your part, but just wanted to chime in with my thoughts on the subject. Thanks again!!!

Score: 3

2024-08-22 7:59 pm, comment by nickpop

It would be nice to get more than 5 stars to choose from. I often feel like shows I go to are maybe better than a 3, but maybe not a 4, or not quite a 5, but definitely better than a 4, etc. Would be nice to have a 1-100 system. or at least 1-10.

Score: 3

2024-08-23 10:21 am, comment by mcgrupp81

@nole095 said:

Why not separate top show page by era? Phish 1.0 , 2.0 , 3.0 , etc.

This is the way.

Score: 1

2024-08-23 11:33 am, comment by jdonovan

@Greenarrow74 said:

Interesting stuff and thanks for sharing. One thing I've always felt would be important to distinguish is whether or not a rater was in attendance at a show. We all inherently feel there is attendance bias...it would be great to see some data on that. Can you show us two ratings? 1 of those who attended and 1 of those who hadn't?

This would be fascinating to see lol. I'd imagine a major difference in ratings for shows that aren't widely acclaimed as objective bangers.

Score: 0

2024-08-24 11:16 am, comment by hsihdthai

The heaters of 1.0 can't even be sniffed by the best of 3.0 & 4.0.

Grade on a curve.

1997-12-07 and 2024-08-17 don't deserve to be in the same breath.

Phish.net should publically list WaxBanks and NOOB100 reviews

Long live '97-2000 when Phish was the baddest touring act on the planet.

Downvote away.

Score: 0