So like most products that a developer creates with one purpose in mind my first iteration of Sortilio was pretty bare bones. Sure if you had a small media collection that was named semi-coherently it worked fine (like it did for my test data) but past that it started to fall apart rather rapidly. Case in point: I let it loose on my own media collection, you know for the purposes of eating my own dog food. It didn’t take long for it to fall flat on its face, querying The TVDB’s API so rapidly that the rate limiter kicked in almost instantaneously. There was also the issue of not being able to massage the data once it had done the automated matching portion as even the best automated tools can still make mistakes. With that in mind I set about improving Sortilio and put the finishing touches on it yesterday.
Now the first update you’ll notice is the slightly changed main screen with a new Options tab and two extra buttons down in the right hand corner. They all function pretty much as you’d expect: the options tab has a few options for you to configure (only one of them works currently, the extensions one), save will export the current selection to a file for use later and load will import said file back into Sortilio. The save/load functionality is quite handy if you’d like to manually go in there and sort out the data yourself as it’s all plain XML that I’m sure anyone with half a coding mind about them would be able to figure out. I put it in mostly for debugging purposes (re-running the identification process is rather slow, more on that in a bit) but I can see it being quite useful, especially with larger collections.
As I mentioned earlier whilst the automated matching does a pretty good job of getting things right there are times when it either doesn’t find anything or its got it completely wrong. To alleviate this I added in the ability for you to be able to double click the row to bring up the following screen:
Shown in this dialog is the series drop down which allows you to select from a list of episodes that Sortilio has already downloaded. The list is populated by the cache that Sortilio creates from its queries to The TVDB so if it managed to match one file in the series correctly it will have it cached already so you can just select it and hit update. Sortilio will then identify other files that had the same search term and ask if you’d like to update them as well (since it will have probably got them wrong as well). Should the series you’re looking for not be available you can then hit the search button which brings up this dialog:
From here you can enter whatever term you want and hit search. This will then query The TVDB and then display the results in a list for you. Select the most appropriate one and then hit OK and you’ll have the new series assigned to that file.
Under the hood things have gotten quite a bit better as well. The season string matching algorithm has been improved a bit so that identifies seasons better than it previously did. For instance if you had a file that was like say battlestar.galactica.2003.s01e20.avi Sortilio would (wrongly) identify that as season 20 because of the 2003 before the series/episode identifier. It now prefers the right kind of identifiers and is a little better overall at getting it right, although I still think that the way I’m going about it is slightly ass backwards. Chalk that up to still figuring out how to best do string splitting based on a regex.
Now on the surface if you were to compare this version to the previous it would appear to run quite a bit slower. There’s a good reason for this and it all comes down to the rate limit on The TVDB API. After playing around with various values I found that the sweet spot was somewhere around a 2 second delay between searches. Without any series cached this would mean that every request will incur a 2 second penalty, significantly increasing the amount of time required to get the initial sort done. I’ve alleviated this somewhat by having Sortilio search its local cache first before attempting to head out to the API but that’s still noticeably slower that it was originally. I’ve reached out to the guys behind The TVDB in the hopes that I can get an excerpt of their database that I can include within Sortilio that will make the process lightening fast but I’ve yet to hear back from them.
So as always feel free to grab it, have a play and then send me any feedback you have regarding it. I’ve already got a list of improvements to make on this version but I’d definitely call this usable and to prove a point I have indeed used it on my own media collection. It gets about 90% of the way there with the last 10% needing manual intervention, either within Sortilio or outside cleaning up after it has done its job. If you’ve used it and encountered problems please save the sort file and the debug log and send them to me at [email protected].
You can grab the latest version here.
[NOTE: There is no link currently because gmail barfed at the file attachment I sent myself to upload this morning. Follow me on Twitter to be notified of when it comes out!]
My post last week about the trials and tribulations of sorting ones media collection struck a chord with a lot of my friends. Like me they’d been doing this sort of thing for decades and the fact that none of us had any kind of sense to our sorting systems (apart from the common thread of “just leave it where it lies”) came at something of a surprise. I mean just taking the desk I’m sitting at right now for an example it’s clear of everything bar computer equipment and the stuff I bring in with me every day. The fact that this kind of organization doesn’t extend to our file systems means that we either simply don’t care enough or that it’s just too bothersome to get things sorted. Whilst I can’t change the former I decided I could do something about the latter.
So my quest last week proving fruitless I set about developing a program that could sort media based on a couple cues derived from the files themselves. Now for the most part media files have a few clues as to what they actually are. For the more organized of us the top level folder will contain the episode name but since mine was all over the place I figured it couldn’t be trusted. Instead I figured that the file name would be semi-reliable based on a cursory glance at my media folder and that most of them were single strings delimited with only a few characters. Additionally the identifier for season and episode number is usually pretty standard (S01E01, 2×01,1008, etc) so that pulling the season out of them would be relatively easy. What I was missing was something to verify that I was looking in the right place and that’s where I TheTVDB comes in.
The TV Database is like IMDB for TV shows except that it’s all community driven. Also unlike IMDB they have a really nice API that someone has wrapped up in a nice C# library that I could just import straight into my project. What I use this for is a kind of fuzzy matching filter for TV show names so that I can generate a folder with the correct name. At this point I could also probably rename the files with the right name (if I was so inclined) but for the point of making the tool simple I opted not to do this (at this point). With that under my belt I started on the really hard stuff: figuring out how to sort the damn files.
Now I could have cracked open the source of some other renaming programs to see how they did it but I figured out a half decent process after pondering the idea for a short while. It’s a multi-stage process that makes a few assumptions but seems to work well for my test data. First I take the file name and split it up based on common delimiters used in media files. Then I build up a search string using those broken up names stopping when I hit a string that matches a season/episode identifier. I then add that into a list of search terms to query for later, checking first to see if it’s already added. If it’s already in there I then add the file path into another list for that specific search term, so that I know that all files under that search term belong to the same series. Finally I create the new file location string and then present this all to the user, which ends up looking like this:
The view you see here is just a straight up data table of the list of files that Sortilio has found and identified as media (basically anything with the extension .avi or .mkv currently) and the confidence level it has in its ability to sort said media. Green means that in the search for the series name it only found one match, so it’s a pretty good assumption that it’s got it right. Yellow means that when I was doing a search for that particular title I got multiple responses back from TheTVDB so the confidence in the result is a little lower. Right now all I do is take the first response and use that for verification which has served me well with the test data, but I can easily see how that could go wrong. Red means I couldn’t find any match at all (you can see what terms I was searching for in the debug log) and everything marked like that will end up in one giant “Unsorted” folder for manual processing. Once you hit the sort button it will perform the move operations, and suffice to say, it works pretty darn well:
Of course it’s your standard hacked-together-over-the-weekend type deal with a lot of not quite necessary but really nice to have features left out. For starters there’s no way to tell it that a file belongs to a certain series (like if something is misspelled) or if it picks the wrong series to tell it to pick another. Eventually I’m planning to make it so you can click on the items and change the series, along with a nice dialog box to search for new ones should it not get it right. This means you might want to do this on a small subset of your media each time (another thing I can code in) as otherwise you might get files ending up in strange folders.
Also lacking is any kind of options page where you can specify things like other extensions, regex expressions for season/episode matching and a whole host of other preferences that are currently hard coded in. These things are nice to have but take forever to get right so they’ll eventually make their way into another revision but for now you’re stuck with the way I think things should be done. Granted I believe they’ll work for the majority of people out there, but I won’t blame you if you wait for the next release.
Finally the code will eventually be open sourced once I get it to a point where I’m not so embarrassed by it. If you really want to know what I did in the ~400 odd lines that constitute this program then shoot me an email/twitter and I’ll send the source code to you. Realistically any half decent programmer could come up with this in half the amount of time I did so I can’t imagine anyone will need it yet, unless you really need to save 3 hours 😛
So without further ado, Sortilio can be had here. Download it, unleash it on your media files and let me know how it works for you. Comments, questions, bugs and feature requests can be left here as a comment, an @ message on Twitter or you can email me on [email protected].