IGF 2013: Ze YouTube Playlist

This year’s Independent Games Festival is near, they announced the Main Competition’s Finalists just this week. But because everyone knows about those and I like collecting things and was bored, I made a huge YouTube Playlist (three to be exact, because of YT limitations) containing all Trailers of all entries. This post will be about how I did it.

Manually clicking through all 588 entries and adding every single one to aforementioned playlist would have been madness, surely there must be a way to do this via some kind of script, right? Fortunately, there is. YouTube has a well-documented Data API and thanks to Matthew Wegner‘s work on the judging backend, the IGF website now features this giant JSON feed which is updated every 30 minutes from live data, for exactly this kind of third-party funzies.

Here we go

Why Bash? Because I love it. But oh noez, how to parse a bunch of JSON? Well, someone already did it. Behold:

curl http://submit.igf.com/json | ./JSON.sh | grep -E '"video","url"[^"]*'
["entries",0,"video","url"]     "http:\/\/www.youtube.com\/watch?v=LKNBa3yji8M"
["entries",1,"video","url"]     "http:\/\/www.youtube.com\/watch?v=fVs2lCz0oWg"
["entries",2,"video","url"]     "http:\/\/www.youtube.com\/watch?v=TPlkZRqnxUQ"
["entries",3,"video","url"]     "http:\/\/www.youtube.com\/watch?v=soLtbblQnew"
["entries",4,"video","url"]     "http:\/\/www.youtube.com\/watch?v=cz_h-v76gs4"
["entries",5,"video","url"]     "http:\/\/www.youtube.com\/watch?v=A6TFwXtyK-U"
["entries",6,"video","url"]     "http:\/\/www.youtube.com\/watch?v=jC1unLrT9oM"
["entries",7,"video","url"]     "http:\/\/www.youtube.com\/watch?v=GpjwVwZhoXo"
["entries",8,"video","url"]     "http:\/\/www.youtube.com\/watch?v=aJna79Tz7TQ"
["entries",9,"video","url"]     "http:\/\/www.vimeo.com\/42731394"
# ... etc

Which grabs that giant file and returns only the video url fields we want. Next, let’s filter out the Vimeo URLs and get rid of the remaining JSON stuff, we don’t need those.

less videos.txt | grep vimeo > videos-vimeo.txt
# Save those for later.
less videos.txt | grep youtube | cut -f 2 | tr -d '"\\#\!' > videos-youtube.txt
# In fact, we only need the video IDs. Regex to the rescue!
grep -o -E '[-0-9A-Za-z_]{11}' videos-youtube.txt > youtube-ids.txt
# Blammo. Simple, but surprisingly effective.

Now we got a list of YouTube video IDs. And because we read the goddamn manual, we’re aware of YouTube’s 200 videos per playlist limit, so let’s split that file up for later:

split -d -l 200 youtube-ids.txt youtube-ids.
# Quick check if that did what we wanted.
wc -l youtube-ids*

Yay!

API Stuff

So Google wants us to use OAuth 2.0. Ugh. Luckily, as per their deprecation policy, they still provide the ClientLogin API until April 20, 2015.

Here we only need to send a single HTTPS POST request and possibly solve a CAPTCHA, which will authorize our machine temporarily and gets us the token needed for subsequent API requests. And because we don’t want or need to do stuff like calculating Content-Lenght headers manually, we can use curl:

curl https://www.google.com/accounts/ClientLogin \
--data-urlencode Email=jeremy@example.com --data-urlencode Passwd=foo+bar \
-d accountType=GOOGLE \
-d source=Ludonaut-YTplaylist-Test-v1 \
-d service=youtube

Which should generate an answer like this:

SID=DQAAAHYBADCv2pSv7nfacDNwz3zEDUGtrSvN...gI8KhGAQZV4NexHZoQPlabTsGuRZeIBxj1A
LSID=EUBBBIaBADCl-kNxvRVmcQghpt3cqSMfJ8Z...j6xFK6QxaAcqy_9Pej8jhEnxS9E61ftQGPg
Auth=EUBBIacAAADK-kNxvRVmcQghpt3cqSMfb44...PSnBj3Z2vYwOEDjjG3Q53aQVC2132JKOuGh

The Auth part is the only part we need though. The YouTube Data API uses XML, so for each video we want to add to a playlist, a seperate HTTP POST has to be made, containing the proper XML:

<?xml version="1.0" encoding="UTF-8"?>
<entry xmlns="http://www.w3.org/2005/Atom"
    xmlns:yt="http://gdata.youtube.com/schemas/2007">
  <id>VIDEO_ID</id>
  <yt:position>1</yt:position>
</entry>

The important part here is the VIDEO_ID. The <yt:position> tag is entirely optional. By default, the newly added video will be added to the end of a playlist. Put this into a file, for example entry.xml and POST that shit:

curl --silent --request POST --data-binary "@entry.xml" \
--header "Content-Type: application/atom+xml" \
--header "Authorization: GoogleLogin auth=ABCDEFG" \
"http://gdata.youtube.com/feeds/api/playlists/PLAYLIST_ID"

Pow! Repeat 588 times.

Just kidding, the obvious next step is of course a simple shell script looping through the video IDs. The following needs just the Auth token, three playlist IDs, and our three files containing 200 YouTube video IDs max.

Ze Script

#!/bin/bash

auth=ABCDEFG
playlist_id1=HIJK
playlist_id2=LMNO
playlist_id3=PQRS

function echoxml {
    echo '<?xml version="1.0" encoding="UTF-8"?>'
    echo '<entry xmlns="http://www.w3.org/2005/Atom"'
    echo 'xmlns:yt="http://gdata.youtube.com/schemas/2007">'
    echo "<id>$1</id>"
    echo '</entry>'
}

# brace expansion ftw
for pl in $playlist_id{1,2,3}
do
    for i in $(less youtube-ids.{00,01,02})
    do
        file="$i.xml"
        echoxml "$i" > "$file"

        for j in $(less "$file")
        do
            curl --silent --request POST --data-binary "@$j" \
            --header "Content-Type: application/atom+xml" \
            --header "Authorization: GoogleLogin auth=$auth" \
            "http://gdata.youtube.com/feeds/api/playlists/$pl"
        done
    done
done

exit 0

Boom. Now, for those pesky Vimeo videos, I’m going to use youtube-dl, as it supports Vimeo just fine and even has an option for a batch file, which makes this final step a breeze:

less videolist-vimeo | grep vimeo | cut -f 2 | tr -d '"\\#\!' > vimeolist.txt
# aww yeeaaah
youtube-dl -a vimeolist.txt

One could take this one step further and upload those videos via the data API too, but I’ll stop here. The in-browser uploader is just too convenient not to use in this case, and my goal was not to turn this into chore. Anyway, welcome to the wonderful world of Bash scripting, where you’ll save a ton of time writing cool command line magics, just to waste all of it later by bragging about it in your blog.

Thanks for reading.