Exploring Book Publishing Trends Through Data

Theo Zhang
September 2, 2022

I am an avid reader and a fan of well organized data, so I decided to explore the website Goodreads using Athenic AI. Goodreads is a website for readers to rate and discuss books and the source of the data I will be analyzing.

The Trends

I first wanted to see the distribution of the length of the books on Goodreads.

Based on this graph, it appears that most books are around 300 pages long. However, the top 10 most popular books based on number of ratings had an average of 470 pages. That is not surprising seeing as lengthy books, such as Twilight by Stephenie Meyers and the Harry Potter series by J.K. Rowling are prominent in that ranking:

This ranking is based on the total amount of reviews per book, including those without written reviews. Interestingly, when looking at the books with the highest number of written reviews, the lists do not match up:

I hypothesize that certain books are more provocative than others, garnering more written reviews to discuss the book’s topics. For example, The Book Thief is the book with the second most amount of written reviews. This book is a very well known, thought provoking book set in an incredibly tumultuous and famous time period, likely the reason why it has so many written reviews.

Ratings for books had a very clear trend, a predictable bell curve with surprising spikes at either end:

Let’s take a closer look at the two spikes at either end that do not follow the bell curve structure and compare those to the peak of the bell curve.

A sample of 10 books that comprise the higher end spike:

A sample of 10 books that comprise the lower end spike:

A sample of 10 books that comprise the most common average rating of 4 stars:

From these three tables, it is clear that the books with a rating closer to the median have a higher number of reviews compared to the two extremes. One could conclude that a higher number of ratings provide a more diverse opinion on the books, leading to less extreme ratings.

The top 20 publishers that published the most books:

Vintage is actually a subdivision of Penguin Random House, so the top three positions of the most prolific publishers are all from the same company. Penguin Random House can distinctly be seen as a market leader through this chart.

And lastly, the top 10 authors that wrote the most books:

None of these authors grace the previous two lists of the most reviewed books on Goodreads, but many are highly recognizable names.


It was fascinating to be able to use the data from Goodreads to clearly visualize how the opinions of readers and publishing trends affect many trends. Additionally, using Athenic AI to organize and display these trends made the process incredibly simple, and even allowed me to make connections between different parts of the data I would not have seen otherwise.

Check out the project to explore the dataset I used more and see the original graphs here.

Check out the raw data here.

