W?at?even?is?encod???anyw??

The other day I wanted to find a link that we discussed a few years ago and couldn’t think of the name of it. I spent about 20 minutes asking people, with no luck. Then I had a brainstorm: check the irc bot logs.

So I open up postgres and start digging.

databas=# select url, title, created_at from links where url like '%youtube%' and lower(title) like '%dick%';
                           url                           |                                                    title                                                    |         created_at
---------------------------------------------------------+-------------------------------------------------------------------------------------------------------------+----------------------------
 http://www.youtube.com/watch?v=xvABlAckloo              | Saints Row: The Third - Beat You With A Dick Bat - YouTube                                                  | 2013-07-21 22:21:14.873167
 https://www.youtube.com/watch?v=baeiVOKgWMo&app=desktop | Dick Will Make You Slap Someone! - YouTube                                                                  | 2014-04-23 14:43:58.282664
 https://www.youtube.com/watch?v=GMX6OluCfjc             | Barbara Markay - Give Your Dick To Me - YouTube                                                             | 2014-06-09 15:34:55.604511
 https://www.youtube.com/watch?v=4zYWyXwKrHg             | Silicon Valley Season 1 Episode 8 (S01E08) || Erlich's Dick Theory/Dick Joke Scene/Mean Jerk Time - YouTube | 2014-06-17 20:30:54.511909
 http://www.youtube.com/watch?v=wnJVPmXpRvE#t=50         | SO SUCK A DICK - YouTube                                                                                    | 2014-10-03 15:03:29.603921
(5 rows)

Well that number of results seems really low considering, but ok. So I change the filters a bit, and tried several different keywords, like ‘dong’, ‘dick’, ‘announcement’, ‘pivot’, etc but nothing really returned. Ok, so how old is our oldest link from youtube, and let’s work our way forward.

databas=# select url, title, created_at from links where url like '%youtube%' limit 10;
                            url                             |             title              |     created_at
------------------------------------------------------------+--------------------------------+---------------------
 http://www.youtube.com/watch?v=R8wXWTK_owI&feature=related | ????????????????               | 2008-07-10 02:49:27
 http://www.youtube.com/watch?v=ZchrpWMczNM                 | ?????????????????????????????? | 2008-07-10 04:10:41
 http://www.youtube.com/watch?v=lkDbT4GbXyA                 | ??????????????                 | 2008-07-10 19:13:45
 http://www.youtube.com/watch?v=DDZ47fvEG9c                 | ??????????????                 | 2008-07-10 19:43:25
 http://www.youtube.com/watch?v=JboQmDIdKWs                 | ????????????????????           | 2008-07-11 01:06:55
 http://www.youtube.com/watch?v=klteYv1Uv9A                 | ???????????????????            | 2008-07-11 02:35:47
 http://youtube.com/watch?v=o-jFKW4vrCw                     | ???????????????????            | 2008-07-11 04:50:39
 http://www.youtube.com/watch?v=0w0c0DOLrsw                 | ??????????????????????????     | 2008-07-11 17:14:43
 http://www.youtube.com/watch?v=MD1sAKcdQUg                 | ???????????????                | 2008-07-11 20:29:47
 http://www.youtube.com/watch?v=OeBKHj_WaEY                 | ???????????????????????????    | 2008-07-11 20:31:41
(10 rows)

Dafuq? Fantastic, the titles are mangled for almost 5 years worth of links. I suspect this happened when I switched from sqlite to postgres (for maximum irc scalablity).

Because this was REALLY bothering me and I wanted to watch the video, I decided to pull all youtube links and fix the titles so I could maybe find what I was after. I saved all unique youtube urls to a file called ‘youtube’ and wrote some #devops to get the correct titles.

require 'mechanize'

agent = Mechanize.new { |agent|
  agent.user_agent_alias = 'Mac Safari'
}

output = File.open('titles', 'w')
total = File.open('youtube').count

File.open('youtube') do |input|
  input.each_line do |link|
    blob = agent.get(link)
    # 'Sorry about that.' appears in the body when a video is removed
    # 'This video is private.' appears in the body when a video is private (duh)
    if blob.body =~ /Sorry about that./ || blob.body =~ /This video is private./
      puts "[#{input.lineno}/#{total}] blocked or private"
      next
    else
      title = blob.title.gsub(' - YouTube', '')
      puts "[#{input.lineno}/#{total}] #{link.chomp} => #{title}"
      output.puts "#{link.chomp} => #{title}"
      output.flush
    end
  end
end

output.close

After it ran (took about 3 minutes wall clock time for ~1800 links) I eyeballed the titles looking for something… what, I wasn’t entirely sure, but I was sure it had something to do with drawing penises in a social network type medium.

No dice, however. So, I gave it a bit more poking and finally gave up. It wasn’t really that important and I figured I could dig it out of the old lady’s AIM logs later on that night. I got back to work in the redis mines and passed a few hours uneventfully.

Later, as I am staring blankly at the wall, I think “bebo? that sounds familiar” and solved the mystery.

This weekend if I get bored I am going to use the above code to run through each of the links in the database that has a screwed up title and update them with the correct title. I don’t like jacked up data.

The overall lesson here is: #devops, not even once.

At least I finally found the link: