Reverse engineering list of used games

In Belgium we have this retailer, which sells games. And then buys them back again, and tries to sell them again. They have a slow, bulky website that allows you to search through the used games, based upon your platform of choice, and your preferred store.

But wouldn't it be cool if we could scrape this data, and see when games where being brought in, and when they disappear again?

Ofcourse it would. So, on to reverse engineering the website!!

Step 1: Figuring out the URL

https://www.gamemania.be/services/gamemaniabe/nl-BE/ProductLister/GetNextPage/%7BDBBA01E1-9E2C-41D5-8B98-924DC693DD89%7D?dataSource=%7BDBBA01E1-9E2C-41D5-8B98-924DC693DD89%7D&currentQuery=filter%3Dused_available_in.eq.sint%20niklaas%2Fhas_used_article.eq.true%2Fplatform_name.eq.playstation%204%26page%3D1%26page_size%3D2000%26sort%3Dproduct_release_date%26sort_type%3Ddesc  

Is the URL I curl, which looks like this when you put it through a URL decoder:

{DBBA01E1-9E2C-41D5-8B98-924DC693DD89}?dataSource={DBBA01E1-9E2C-41D5-8B98-924DC693DD89}
&currentQuery=filter=used_available_in.eq.sint niklaas/has_used_article.eq.true/platform_name.eq.playstation 4
&page=1
&page_size=2000
&sort=product_release_date
&sort_type=desc

This is the URL that gets called when you scroll down on the used products page. It dynamically tries to load in the next 8 results. But we have crancked this up to 2000 results, so we don't have to query the server a few 100 times, but only once.

Next we notice that we have a location in there, which is the store closest to my geographical location, and my platform of choice: the PS4.

But what happens when we try to curl this URL? Nothing:

$ curl -vvv 'https://www.gamemania.be/services/gamemaniabe/nl-BE/ProductLister/GetNextPage/%7BDBBA01E1-9E2C-41D5-8B98-924DC693DD89%7D?dataSource=%7BDBBA01E1-9E2C-41D5-8B98-924DC693DD89%7D&currentQuery=filter%3Dused_available_in.eq.sint%20niklaas%2Fhas_used_article.eq.true%2Fplatform_name.eq.playstation%204%26page%3D1%26page_size%3D2000%26sort%3Dproduct_release_date%26sort_type%3Ddesc'
*   Trying 40.68.229.194...
* Connected to www.gamemania.be (40.68.229.194) port 443 (#0)
* found 173 certificates in /etc/ssl/certs/ca-certificates.crt
* found 697 certificates in /etc/ssl/certs
* ALPN, offering http/1.1
* SSL connection using TLS1.2 / ECDHE_RSA_AES_256_CBC_SHA384
*      server certificate verification OK
*      server certificate status verification SKIPPED
*      common name: www.gamemania.be (matched)
*      server certificate expiration date OK
*      server certificate activation date OK
*      certificate public key: RSA
*      certificate version: #3
*      subject: 
*      start date: Tue, 26 Jul 2016 00:00:00 GMT
*      expire date: Wed, 18 Apr 2018 23:59:59 GMT
*      issuer: C=GB,ST=Greater Manchester,L=Salford,O=COMODO CA Limited,CN=COMODO RSA Extended Validation Secure Server CA
*      compression: NULL
* ALPN, server did not agree to a protocol
> GET /services/gamemaniabe/nl-BE/ProductLister/GetNextPage/%7BDBBA01E1-9E2C-41D5-8B98-924DC693DD89%7D?dataSource=%7BDBBA01E1-9E2C-41D5-8B98-924DC693DD89%7D&currentQuery=filter%3Dused_available_in.eq.sint%20niklaas%2Fhas_used_article.eq.true%2Fplatform_name.eq.playstation%204%26page%3D1%26page_size%3D2000%26sort%3Dproduct_release_date%26sort_type%3Ddesc HTTP/1.1
> Host: www.gamemania.be
> User-Agent: curl/7.47.0
> Accept: */*
> 
< HTTP/1.1 200 OK  
< Cache-Control: private  
< Server: Microsoft-IIS/8.5  
< Set-Cookie: gamemaniabe#lang=nl-BE; expires=Sun, 31-Jan-2117 14:01:25 GMT; path=/  
< Set-Cookie: ASP.NET_SessionId=du3l1hp2apbt4yvu5zf43sp4; path=/; HttpOnly  
< Set-Cookie: SC_ANALYTICS_GLOBAL_COOKIE=3437feae42334e00b9c6f91b266bf8af|False; expires=Sun, 31-Jan-2027 14:01:25 GMT; path=/; HttpOnly  
< Set-Cookie: GamaUserData=; expires=Mon, 30-Jan-2017 14:01:25 GMT; path=/; HttpOnly  
< Set-Cookie: GamaBasketData=; expires=Mon, 30-Jan-2017 14:01:25 GMT; path=/; HttpOnly  
< Date: Tue, 31 Jan 2017 14:01:25 GMT  
< Content-Length: 0  
<  
* Connection #0 to host www.gamemania.be left intact

That's annoying... why does it work in a browser? But not with curl? Because we need to add a referrer!

$ curl -vvv 'https://www.gamemania.be/services/gamemaniabe/nl-BE/ProductLister/GetNextPage/%7BDBBA01E1-9E2C-41D5-8B98-924DC693DD89%7D?dataSource=%7BDBBA01E1-9E2C-41D5-8B98-924DC693DD89%7D&currentQuery=filter%3Dused_available_in.eq.sint%20niklaas%2Fhas_used_article.eq.true%2Fplatform_name.eq.playstation%204%26page%3D1%26page_size%3D2000%26sort%3Dproduct_release_date%26sort_type%3Ddesc' --referer 'https://www.gamemania.be/games/Used/?filter=used_available_in.eq.sint%20niklaas/has_used_article.eq.true/platform_name.eq.playstation%204&page=1&page_size=1&sort=product_release_date&sort_type=desc' | less
*   Trying 40.68.229.194...
* Connected to www.gamemania.be (40.68.229.194) port 443 (#0)
* found 173 certificates in /etc/ssl/certs/ca-certificates.crt
* found 697 certificates in /etc/ssl/certs
* ALPN, offering http/1.1
* SSL connection using TLS1.2 / ECDHE_RSA_AES_256_CBC_SHA384
*        server certificate verification OK
*        server certificate status verification SKIPPED
*        common name: www.gamemania.be (matched)
*        server certificate expiration date OK
*        server certificate activation date OK
*        certificate public key: RSA
*        certificate version: #3
*        subject: 
*        start date: Tue, 26 Jul 2016 00:00:00 GMT
*        expire date: Wed, 18 Apr 2018 23:59:59 GMT
*        issuer: C=GB,ST=Greater Manchester,L=Salford,O=COMODO CA Limited,CN=COMODO RSA Extended Validation Secure Server CA
*        compression: NULL
* ALPN, server did not agree to a protocol
> GET /services/gamemaniabe/nl-BE/ProductLister/GetNextPage/%7BDBBA01E1-9E2C-41D5-8B98-924DC693DD89%7D?dataSource=%7BDBBA01E1-9E2C-41D5-8B98-924DC693DD89%7D&currentQuery=filter%3Dused_available_in.eq.sint%20niklaas%2Fhas_used_article.eq.true%2Fplatform_name.eq.playstation%204%26page%3D1%26page_size%3D2000%26sort%3Dproduct_release_date%26sort_type%3Ddesc HTTP/1.1
> Host: www.gamemania.be
> User-Agent: curl/7.47.0
> Accept: */*
> Referer: https://www.gamemania.be/games/Used/?filter=used_available_in.eq.sint%20niklaas/has_used_article.eq.true/platform_name.eq.playstation%204&page=1&page_size=1&sort=product_release_date&sort_type=desc
> 
< Cache-Control: private  
< Content-Type: text/html; charset=utf-8  
< Server: Microsoft-IIS/8.5  
< Set-Cookie: gamemaniabe#lang=nl-BE; expires=Sun, 31-Jan-2117 14:03:21 GMT; path=/  
< Set-Cookie: ASP.NET_SessionId=j3svvjxp4jwsibpmuu3jli2o; path=/; HttpOnly  
< Set-Cookie: SC_ANALYTICS_GLOBAL_COOKIE=a73435f8a212495c9cd22a7d0b3c0281|False; expires=Sun, 31-Jan-2027 14:03:21 GMT; path=/; HttpOnly  
< Set-Cookie: GamaUserData=; expires=Mon, 30-Jan-2017 14:03:21 GMT; path=/; HttpOnly  
< Set-Cookie: GamaBasketData=; expires=Mon, 30-Jan-2017 14:03:21 GMT; path=/; HttpOnly  
< Date: Tue, 31 Jan 2017 14:03:22 GMT  
< Content-Length: 442777  
<  
            <div class="productItem--vView productItem row--products--grid-4" data-webid="productLister-item" data-usedproductavailableincount="33" data-productnewrating="3" data-productusedrating="3" data-pro
                <article>
                    <a class="wrap" href="/games/playstation-4/dark-souls-iii">
                        <header>
                            <h1>
                                <span data-webid="product-name" title="Dark Souls III">Dark Souls III</span>
                            </h1>


                            <div class="meta">

                                                                    <div class="spec  price price--used " data-webid="product-priceused" data-productpriceused="39.98">
                                        <div class="value">
                                            <span class="currency">&#8364;</span> 39<span class="decimal">.98</span>
                                            <span class="type">used</span>
                                        </div>
                                    </div>
                                                                        <div class="spec  price price--new" data-webid="product-pricenew" data-productpricenew="49.98">
                                            <div class="value">
                                                <span class="currency">&#8364;</span> 49<span class="decimal">.98</span>
                                                <span class="type">new</span>
                                            </div>
                                        </div>
                            </div>
                        </header>
                        <figure>
                            <div class='image imageToFit' style='background: url(https://gamemania-sec.azureedge.net/-/media/Sites/GameMania/Products/Games/D/DARK-SOULS/Dark-Souls-III/PlayStation-4/Dark-Souls-
                            </figure>
                        </a>
                    </article>
                </div>

Look at all that glorious data!!! It's not that well aligned, but we can manage that ofcourse.

Step 2: Gather and store data locally

So we write some ruby and put it in a cronjob that gathers all the data every few hours like this:

#!/usr/bin/env ruby

require 'open-uri'  
require 'nokogiri'  
require 'json'  
#require 'date'


url = 'https://www.gamemania.be/services/gamemaniabe/nl-BE/ProductLister/GetNextPage/%7BDBBA01E1-9E2C-41D5-8B98-924DC693DD89%7D?dataSource=%7BDBBA01E1-9E2C-41D5-8B98-924DC693DD89%7D&currentQuery=filter%3Dused_available_in.eq.sint%20niklaas%2Fhas_used_article.eq.true%2Fplatform_name.eq.playstation%204%26page%3D1%26page_size%3D2000%26sort%3Dproduct_release_date%26sort_type%3Ddesc'

referrer = 'https://www.gamemania.be/games/Used/?filter=used_available_in.eq.sint%20niklaas/has_used_article.eq.true/platform_name.eq.playstation%204&page=1&page_size=2000&sort=product_release_date&sort_type=desc'

jsonfile = '/var/www/html/games.lan/games.json'

xml = Nokogiri::HTML(open(url, "Referer" => referrer, "X-Requested-With" => "XMLHttpRequest"))

if File.exists?(jsonfile)  
  input = JSON.parse(File.read(jsonfile))
else  
  input = Hash.new()
end

xml.xpath("//article").each do |game|  
  title = game.xpath('./a/header/h1/span/text()').to_s
  usedprice = game.xpath('./a/header/div/div[1]/@data-productpriceused').to_s
  newprice =  game.xpath('./a/header/div/div[2]/@data-productpricenew').to_s
  image = game.xpath('./a/figure/div/@style').to_s.gsub(/[^\(]+\(([^\(]+)\)/, '\1').to_s
  link = 'https://www.gamemania.be' + game.xpath('./a/@href').to_s
  discount = (newprice.to_f - usedprice.to_f) / newprice.to_f * 100
  if input[title].nil?
    firstseen = Time.now
  else
    firstseen = Time.parse(input[title]["firstseen"].to_s)
  end

  data = {
    "title" => title,
    "usedprice" => usedprice,
    "newprice" => newprice,
    "image" => image,
    "link" => link,
    "discount" => discount,
    "firstseen" => firstseen,
    "lastseen" => Time.now
  }

  input[title] = data
end

# Loop over every entry, and remove those that are over a week old
keystodelete = []  
input.each do |key, game|  
  lastseen = Date.parse(Time.parse(game['lastseen'].to_s).strftime('%Y/%m/%d'))
  threshold = Date.today - 7
  if lastseen < threshold
    keystodelete << key
  end
end

keystodelete.each do |key|  
  input.delete(key)
end

File.open(jsonfile,"w") do |f|  
  f.write(JSON.pretty_generate(input))
end  

Which generates output like this:

{
  "Dark Souls III": {
    "title": "Dark Souls III",
    "usedprice": "39.98",
    "newprice": "49.98",
    "image": "https://gamemania-sec.azureedge.net/-/media/Sites/GameMania/Products/Games/D/DARK-SOULS/Dark-Souls-III/PlayStation-4/Dark-Souls-III/34409.ashx?v=YvX9UH2Ko0yUdTq/6vbBkg&Type=Medium",
    "link": "https://www.gamemania.be/games/playstation-4/dark-souls-iii",
    "discount": 20.008003201280513,
    "firstseen": "2016-12-21 10:50:45 +0100",
    "lastseen": "2017-01-31 12:30:04 +0100"
  },
}

And last but not least, we wrap some HTML around it et voila:

https://games.vuokko.be was born!!!