DS6 Midcourse Project

his past month’s blog post was a no brainer for me – writing about the midcourse project for the Data Science Bootcamp (DS6 with Nashville Software School) which occupied most of my free time. Also Valve hasn’t released the new Dota 2 patch, Atomic Heart isn’t out yet and refreshing the Elden Ring subreddit doesn’t magically release the DLC.

To cut to the chase, anyone interested in checking out the final Shiny app can be found deployed here, with plenty of explanations within (hosted on shinyapps.io):

Tomo Umer – Steam Games Analysis

What happened over the course of the past month (~ 12/21/2022 – 1/21/2022) was me going from the inception of the idea (very vague, some kind of data analysis on the video games on Steam by Valve), to data collection, cleanup, organization, deployment, discovery, and back to cleanup, organization, etc etc…

I couldn’t be happier with the whole experience, with our instructor Michael Holloway providing expert guidance and help throughout, as well as great feedback from our two TAs Rohit Venkat and Neda Taherkhani in polishing and presenting the data in a final form.

One of the most challenging aspects of this project was figuring out where to get the data from and what to do with it. Valve being a private company and all, only releases limited information to the public. So initially I attempted to use Python and web scraping, but ended up realizing that I had no good measure of what data I’m getting – essentially, the website in question had historic data on the popularity of video games, but it was only showcasing the currently still active games. That meant that I ran straight up into survivorship bias (as Michael helped me understand).

With that option out of the window, I decided I was just going to grab everything the Steam APIs have to offer. They were incredibly helpful (nope, not at all), even after I reached out to them with an email their response was a carbon copy of what is stated below:

Using the documentation available, I tried to determine what information I could obtain from games that could prove to be interesting to analyze. I’d have loved to have the historic data of the popularity of games, but oh well, it appeared all I could get is current information. After playing around with it I determined that between the list of publishers, developers, genres, categories (and a few other variables), I had enough information to uncover at least something interesting!

And that I did! I had a lot of fun finding details such as a game allegedly requiring people to be 120 years old, and documenting some examples like that in my R notebook on Github (TU2_games_analysis.Rmd). I also found out that some of the potential variables of interest were not available for large swaths of games (# of recommends or metacritic score being prime examples).

Slowly but surely I was getting a better and better idea of the data I have at hand and once I started working on the Shiny app, I also decided to center the narrative around the release years of video games. That in itself proved to be a considerable challenge since release dates could apparently be anything (and I had fun looking at that – showcased both in my notebook, as well as under the “fun stuff” tab of my Shiny app. Figuring out how to coalesce that and make sense when presenting took a lot of effort. In particular because doing so turned a “Year” variable from a simple integer into a categorical variable (character).

Lastly, one of the most rewarding things that I did with Michael’s help was construct a network graph showcasing the strength of relationships between video game genres. Not only is the final result in my view pretty cool to look at (with options to select different layout algorithms), but also comprehending how to get there taught me a lot.

And on top of that, there is one thing in particular that I feel important to highlight: all of this work and the time invested into this project was made possible … you guessed it … by my lifelong gaming. The same amount of determination and willpower required to beat Malenia, Blade of Miquella in Elden Ring (multiple times, with different builds, see blog post about overcoming failure), is the one I harnessed in tackling this project. When people get tired and need a break from the computer screen every couple of hours, I’m just getting warmed up. Countless times I’ve queued Dota 2 into the night with friends making poor choices with my sleep and yet persisting through another match, gaining considerable mental fortitude in the process. 

To be clear, there’s nothing wrong with just playing because it’s fun or because we’re otherwise going through difficult times in our personal lives – and games help us retain some sanity. And I think pretty much everyone is aware that the entertainment as a whole has that effect on us. 

On the other hand though, unless people have experience with video games (or have seen it with their own kids), the aspect of taking the muscle memory and focus from gaming and using it in real life projects in my view often gets underestimated.

In practice, I found it incredibly easy work through the “break” from our classes that we had over Christmas and New Year. The project was fun, challenging and engaging at every step of the way no matter the day. Further, I was (am) able to attend class over Zoom for 5 hours on a Saturday and then spend another 7 hours (minus some breaks for eating) coding away and analyzing. I don’t think I’d be able to pull of something like that without the experience (heh) I gained from video games.

That’s not to say that one can play games and immediately become an expert c++/R/Python/whatever connoisseur. Of course not! Only practice in those programming languages does that.  I’m also not implying that the skills we gained in one area of our life are easily transferable. That in fact may be the most challenging aspect of this – recognizing and identifying our strengths and applying them to something else.

With our society being built around “value” and products, it’s easy to discount things that on the surface may not appear connected. It’s easy to just attempt to break down people in various skillsets existing independently and pretending we understood everything there is to us.

In fact, let me use a practical example to showcase this with a fictional Bob and Jim:

 – Bob spent 40 hours on learning how to program in python and obtained a certificate, we deem that a worthy skill to be put on a resume.

 – Jim spent 38 hours on let’s say playing League of Legends and 2 hours learning about coding in python (no certificate).

Out of the two, to lots of recruiters and I’m sure potential employers too, Bob is likely the more appealing choice. But I can almost guarantee you Jim would be the smarter investment. He almost certainly types orders of magnitude faster than Bob (those insults when teammates suck don’t write themselves!) and has the propensity to learn and adapt quickly in a dynamic environment will come in handy at almost any kind of workplace.


Comments

One response to “DS6 Midcourse Project”

  1. […] developed in conjunction with my final project (still working on it; in the meanwhile, here’s the blog post to my midcourse project!), but I do have the entirety of the code for various projects we worked on available on my github […]

    Like

Leave a comment