Posts Tagged‘statistics’


New Post Series: State of the Game

Ever since I started reviewing one game a week I’ve eternally struggled with managing my time to make that once a week post. As many of my fellow adult gamers will attest to time is an ever shrinking commodity, especially when it comes to leisure. So I started looking for games I knew I could finish in a week, something with a play time of maybe 4 to 8 hours, but the more I looked for play time stats the more I came up short. There are resources out there of course (How Long to Beat being the shining star among them) but for new games, the ones which people are most interested to see reviews of, the play times aren’t known until the games have been out for some time. So I did what any average coder would do, I started building my own solution to it.


Long story short: I’ve been collecting various bits of data from Steam’s Web API for some time now and it’s reliable enough to provide some insight into games that are released through it. Whilst there’s numerous aspects I could dive into I felt a brief, concise infographic done on a weekly basis would be an interesting exercise, one that would hopefully spur on further conversations about why games were popular, what developers are doing right and, of course, what they’re doing wrong. With that in mind I’ve spent the last month wrangling my data into a usable format and putting it into something digestible, the first of which you can see here (and the second of which precedes this post).

As with all infographics there’s a lot to talk about the data I’m presenting and this post will attempt to provide some insight into what you’re seeing, some of the decisions I made in presenting the data and why some things might not exactly line up with your expectations.

The data I’m using is all publicly available from Steam profiles. If you’ve set your profile to private I can’t see anything: not your achievements, your friends or even any kind of play time stat. That being said if you’d like to be excluded from my data collection just shoot me a message and I can ensure you’re not included in any future collection activities.

I’ve chosen to do this retrospectively as it takes around 2 weeks for most games to get good, reliable data.Thus these will always be 3 weeks in the past, giving all games that are released in that window a minimum of 2 weeks of data collection time before I make any results public. Typically this means I have a sample size of around 200,000 players to work with which I think is large enough to be representative of the larger Steam community. Indeed the few sample runs I did before publishing any of them seemed to line up with my expectations although I’ll be the first to admit that my statistical analysis skills have diminished quite a bit since my university days.

The first section contains some quick stats about the week with a comparison to the previous week’s stats. The one part which might get reworked in future versions of this is the “Top Genre” as this is just based on the number of games released in that genre that week. Indie seems to dominate this pretty much every week so if there’s another high level stat that you’d like to see included I’d very much like to hear your ideas.

The second section is the top 5 games, by total hours played (that I’ve observed), that were released in that week. This does mean that Early Access games, especially those that have spent a long time in the program, tend to stand out however I’ve made the decision to include them for a couple of reasons. For starters that does mean the game is popular and most of the time even the biggest Early Access games still lose out to big AAA releases. Additionally looking at other stats for some popular Early Access games that do eventually “release” they’re usually quite popular in their release week as well. If I find that’s not the case however I do reserve the right to remove them in favour of more deserving releases (although that hasn’t happened yet).

The map (which you can only access by visiting the blog) uses the same metrics as the previous section but at a regional level. If you’ve put your country in your Steam profile then its possible for me to see that and I can tag the play time with the region. The data I have is a little more fine grained than I’m presenting here as Piktocharts has around 170 countries listed whilst steam has around 250. There’s also a couple countries for which I don’t have equivalent data so they’re unfortunately blank. However this does serve as a good way to see what games are popular where and how countries differ from each other.

The trending section is the only one which branches out from games released that week (in fact it excludes them from this list). Trend scores are calculated using a Z Score that compares the average players in the game during that week to the month that preceded it. The higher the Z Score the higher they appear in the list. To qualify for the list the game must have attracted an average player base of more than 100 (to filter out games that went from say, 0 players to 10, which gives a massive score) and manage to maintain it for the majority of the week (to rule out games which trend for all of an hour, which isn’t much of a trend). The reason for trend is the most manual part of this infographic as I have to hunt down just exactly caused them to trend which can be rather esoteric on occasion.

The last section is a couple graphs of data that vary substanitally week to week and can speak to how the week’s games are fairing with the wider community. The hours spent playing, for example, shows the relative percentage of time players spent playing. The first section will always be the largest however the sizes of the sections change week to week. For instance I’ve seen the first section in one week span from 0 to 34 hours played whilst others only spanning 0 to 5. Weeks with broader sections would indicate that the games released that week were capturing players for longer. Similarly the last graph gives an indication of which genres were most popular, in terms of play time, during that week.

So, from now on, I’ll be providing these infographics once per week. I’m keen to see what you like about them, what extra information you’d like to see and what changes you’d make to them. If you’ve got any questions or comments feel free to hit me up on Twitter or through my public email address [email protected]

State of the Game: 26/09/2016 to 02/10/2016

Affordability, Statistics and the Australian Housing Market.

Before I get into what could be a slightly ranty post about the Australian property market I feel it’s prudent to mention that I’m an owner-occupier, investor and would be regarded as being particularly well off when compared to the average Australian. Thus my views may be somewhat skewed by the fact that I have a vested interest in the property market. However I believe that there’s a lot of disinformation out there about housing prices and what constitutes “affordable” property, especially when the entire market is boiled down to single figures. What I intend to show you is that whilst Australian property is more than likely above fair value this does not preclude the average Australian family from owning their own home, nor are first home buyers priced completely out of the market.

There’s been a report circulating recently from NATSEM that says we’ll need a decade of flat housing prices in order for them to come back to affordable levels. This sparked quite the reaction in the media, strangely lacking any direct finger pointing that usually accompanies issues like this. There’s no question that the last decade has seen some extremely wild growth in the Australian property market and for years people have been predicting the ultimate downfall of the Australian housing market. The Global Financial Crisis was supposed to be the trigger that sent property prices tumbling but it had the opposite effect, with extremely low interest rates pulling many into the market and increasing demand significantly. Now that the pressure is back on with interest rates at their pre-GFC levels the question of affordable housing is a hot topic, but it’s not all bad news for those chasing the Australian dream.

For starters let’s dive into the (thankfully unbiased) figures from the NATSEM report. On the surface it looks bad for Australia with the median¹ house price being a whopping 7.3 times that of the median income, 50% higher than what it was back in 2001. However whilst I believe using the median as the measure is by far more intellectually honest than other measures it does hide some important information from the reader. Although the median Australian house price might be $417,000 that also means that 50% of all Australian houses are valued somewhere below that particular line. For first home buyers this means that they shouldn’t be shooting to buy a house at the median price since there is an ample amount of stock available at a much cheaper price bracket. The houses above the median then are usually more suited to those looking to upgrade and not those trying to break into the market.

For interest’s sake I’ve done some calculations based on some typical scenarios. The first is a median income earner attempting to buy a median house with a typical interest rate:

  • Income: $57,000 /year, $3,823.33 / month after tax.
  • Home loan: $396,150 ($417,000 house price, 5% deposit, 7.1% interest, principal and interest) =  $2,662.25 / month
  • Repayment as percentage of total income: 69.63%
In this situation I am in agreement with NATSEM that this is completely unaffordable. I believe that this is a pretty atypical situation however as a single person buying (or financing) a house should definitely not be shooting for a median property, opting instead for the lower end of the spectrum with smaller town houses or apartments. However a more typical scenario would be a young, childless couple (I’ll stick with median incomes) looking for their first home, shooting for the lower end so they can break into the market. Taking these factors into consideration we get:
  • Income: $114,000 /year, $7,646.66 / month after tax.
  • Home loan: $356,535 ($375,000 house price, 5% deposit, 7.1% interest, principal and interest) = $2,396.03 / month
  • Repayment as a percentage of total income: 31.33%
In this situation it’s starting to look a lot better with the percentage of total income spent on housing much closer to the 30% of total income that the banks usually use when determining loan size. The above scenario isn’t too far from the situation I was in when I purchased my first house back in 2007 and whilst it wasn’t the easiest thing in the world to do (it was helped a lot by renting out the spare rooms) it was definitely possible. This doesn’t disprove the point that Australian house prices are unaffordable for median, single income earners however but even in 2001 it would’ve been a struggle.

Since the media hasn’t played the blame game yet I thought I’d throw my hat into the ring on this one. Investors who are negative gearing would be an easy target with this one and they’re usually the first to get blame for high housing prices. However in Australia the vast majority of property, to the tune of 68.90%, is owner-occupied (I.E. people who own it live in it). The remaining 31.10% is investors but the vast majority of investors only own 2 properties, their home and another investment. It then seems infeasible for investors to be solely responsible for housing price gains when the vast majority of property is in the hands of owner-occupiers or one time investors. The price rises logically then come from the majority, but how are they doing so?

Simply put it’s people leveraging the equity in their own homes in order to upgrade to a bigger, better home whilst keeping the loan repayments at a similar level. The initial 2001 – 2004 boom meant that many had enough equity to upgrade and many did so over the years. Of course being rational actors they attempted to maximize their sale price in order to reduce the loan on the next property and this put an upward pressure on housing prices, both on the low (the one they were selling) and high (the one they were buying) end of the market. The interest rate scare of 2007-2008 put enough pressure on people to curtail this behaviour for a while, but the GFC dashed those high rates and the upgrades began again in earnest.

I’ve long been of the opinion that there will never be a house price crash, instead I foresee a long time of stagnant or small negative growth whilst wages catch up to bridge the affordability gap. The simple fact is that prices can only drop significantly if people are forced to sell and although many first home buyers who bought in during the lowest interest rates are feeling the pressure now they form only a small part of the market, not enough to trigger a price collapse and most will simply delay selling until conditions improve.

It is unfortunate that the Australian dream is out of reach for a median single income earner, but many factors point towards housing becoming more affordable for them in future. The government could do a much better job of incentivizing the construction of low cost housing as current market conditions favour bigger, higher cost houses. Additional land releases and incentives for desirable, low cost housing would also go a long way to putting a downward pressure on house prices. It’s not a problem that can be fixed overnight either and we’ll need long lasting reforms in order to keep housing affordable, lest the prices rise and the cycle start all over again.

¹The median in statistics refers to the value in which 50% of the total data set is above that value and 50% is below it. It’s much more resilient to use this figure when you have outliers on either side of the equation which in the case of Australian property and wage figures there are many. Using the average would then be less representative of the real world.

Lies, Damn Lies and Small Sample Statistics.

It’s easy to get lost in the idea that the whole world is close to what you have experienced. Realistically the only thing we have to go by is what we see and hear day by day and philosophically we can’t even really prove that anything else exists outside our own sphere of influence. Before I derail this post into a lot of hand waving about cognition and awareness I wanted to explore the world of misrepresentation of data through the use of either cherry picking results or through sample bias using small or particular populations.

Cast your mind back to 2004, for the Australians among us they would remember that this was the time of the federal election, and the last time that John Howard would win his bid for Prime Minister. Back then I was still a teenager but it was the first time I was eligible to vote in the election. Speaking to all my friends and family I was convinced that this year we would oust Howard and usher in new blood to revive what I saw to be a stagnant government. You can then imagine my shock as not only did the Liberal party win, but did so by taking 5 seats away from Labor. The politically inclined among you would realise that typically Canberra is a Labor electorate and if took nothing but opinions from the people within Canberra you would come to the same conclusion. This was classic sample bias and it led me to become more involved in politics, as I now knew that I couldn’t trust just the people I talk to in order to extrapolate to Australia as a whole.

Just today a good friend of mine sent me this article that also used flawed logic and small sample size to make wild accusations about the general health of the gamer population:

ADULTS who play video games may suffer higher levels of depression and weigh more than non-gamers, according to a study released today.

The study, conducted by the US Centres for Disease Control and Prevention (CDC), Emory University and Andrews University, found “measurable correlations between video-game playing and health risks”.

The study – “Health-Risk Correlates of Video-Game Playing Among Adults” – is being published in the October issue of the American Journal of Preventive Medicine.

The researchers surveyed 562 adults ranging in age from 19 to 90 in the Seattle-Tacoma area of Washington state. A total of 45.1 per cent of those surveyed reported playing video games.

The sample is, to say the least, incredibly biased. Let me just pick out a couple of the problems with the data set they have used:

  • The sample size is incredibly small to be able to draw any substantial conclusions.
  • The use of weasel words like “may” and “measurable correlations” are not something you find in well researched scientific reports.
  • The sample is taken from one area, which according to this lovely animated graphic from the Center for Disease Control and Prevention shows that they have an obesity rate of greater than 30%.
  • Repeat after me, correlation does not equal causation.

If we were say to apply this to my group of friends (of which the sample size is approximately equal in gamer/non-gamer distribution with a fifth of the size) you would probably find that gaming has little to no correlation to obesity and depression. In fact you’d see that gamers on average tend to be healthier, but the problem is that whilst we all identify as game players we each have our own reasons for keeping fit and healthy. The numbers used and conclusions drawn are misleading at best and anyone who’s spent even a small amount of time working with statistics will tell you that using a sample size of 0.0094% (564 people divided by 6 billion in the world) of the population can not be relied on.

Statistics are the one thing that everyone is familiar with but no one seems to understand completely. All too often I’m seeing reports being made or news articles being published that use fatally flawed mathematics and unfortunately this often misleads people to believe things they otherwise would not. For the mathematically inclined among us it then becomes a battle of education to give people the tools so they can break down the arguments analytically, however there’s only so far you can go before people stop listening.

As usual there’s a slight anti mass media bias to this post but what I truly desire is for people to question information that is given to them. We humans are wired to turn off our sceptical parts of our brain when an expert tells us something and this is why we need to build up our bullshit detectors so we don’t get fooled by the people who wield the power of statistics. It just so happens that the biggest abusers of this power are the media.

And yes the irony of using statistics to disprove statistics isn’t lost on me. I’ll still take the moral high ground on this issue however 🙂