I buy DVDs but don't like watching them. They are cumbersome and slow to load and the plastic they're made out of is too soft and far too easily damaged. So what I do is rip them to my hard drive and watch them there, while keeping the DVDs safe as my archive. The mpeg2 format that DVDs use is very wasteful of space so I prefer to re-encode the files to something a lot more compact.
(Note that the following examples work on Linux, and might work on Mac OSX, but will work on MSWindows only if you have installed the Gnu Tools and even then will need tweaking.)
First, I use mplayer to rip the file, for example:
(to rip the first track as "video.vob" -- there may be dozens of tracks on the DVD.)
Next I encode that as mpeg4 video. It is slightly complicated by the fact that DVDs store videos as narrower images then expect the player to stretch them horizontally upon playing them. This requires that the final width be stored within the file structure. However this is a problem because after re-encoding the video, that data will be lost. There is an open format header that lets you explicitly record the intended width and height, but not all video players use it, so the simplest way is simply to stretch the video to the correct final width while encoding it. This is tricky. I do it like this:
The first line gets mplayer to display information about the file, search for the line containing "=>" which tells about the view size, then cut up the result to get the intended width value, which gets stored in the $width variable.
The second line uses mplayer's sister program mencoder to encode the video to a more efficient form while scaling it to the appropriate width and a height of 576 (very few videos use any other height).
You could, of course, use just the second line, substituting whatever size you want to scale it to. For shows that are just talking I commonly use something like scale=512:288, or scale=384:288 because the image quality doesn't matter if all you're interested in is the witty dialogue.
A bit-rate of 1800 is fine for most things, but old black and white movies can manage with maybe 1200, whereas modern movies with a lot of action may need higher rates as high as 2200 or even higher. Note that higher bitrates increase the filesize. I should also mention that the example uses the audio ID of 128 for the soundtrack (128 is most common for English soundtracks). There are typically anywhere from one to several soundtracks, and any audio commentary, if it exists will be one of them). You can see what audio is on a DVD video by typing into a terminal:
(for video track 1 -- there may be several video tracks on the DVD -- it will play the movie and at the same time print up info about the track in the terminal where you issued the command.)
Or, for example:
(lets you experiment by playing the 8th track on the DVD with audio ID 132)
Now we are getting close to the main reason for this post.
I am going deaf, so I need the subtitles. I rip them like this:
This produces two files called "video.idx" and "video.sub". And here is the peculiar catch of DVD subtitles. They use a very clumsy method of subtitling video. The first part, the .idx file is an index and is just a text file which tells when each subtitle should be displayed. The second file, the .sub file, is a series of pictures of the subtitles that get overlaid on the movie at the appropriate times. What would have been far more sensible is if they'd used one of the more efficient, open formats, such as .srt subtitles. They are ordinary text, containing both the timing and the text itself.
(If you are wondering how to find the subtitle ID, which is '2' in the example above, get mplayer to tell you:
to see info about the 4th video track on a DVD. Or play it with the desired subtitles:
for DVD track 3 with subtitle 0, which is usually English.)
So, how does one convert DVD subtitles to something much more efficient like SRT format? Well, yesterday I found out about a lovely little program which does the trick. It is called vobsub2srt, logically enough. This extracts the images from the .sub file and runs them through an OCR (optical character recognition) program to work out what the text is. The OCR program that vobsub2srt uses is the free, opensource tesseract, which is brilliantly accurate, but unfortunately doesn't yet learn from its (few) mistakes, so spellchecking the file afterwards is necessary.
And now we come to what occupied me for so long today.
How can I get a music symbol, like ♪ to appear in the subtitles to denote singing?
Well I finally worked it out!
The .srt file needs to be stored in UTF-8 format which allows thousands of possible symbols, and lets me simply paste the appropriate symbol into the text in a text editor. But that is only half the solution. Next I have to get mplayer to display the text as UTF-8. It took me a while, but I finally realised that mplayer's preferences allow choosing subtitle file encoding. Unfortunately, no matter what I chose it still didn't play properly. Eventually I tried typing "UTF-8" (without the quotes) into the encoding gadget (instead of choosing "Unicode" from the drop-down menu), and it worked! Even later, the subtitle file encoding gadget no longer contains "UTF-8", but mplayer still plays them properly. Yay!
Note that in all the examples above you would, of course, not call the file "video.avi", but would use something a little more descriptive, like, for example:
"Buffy - 05-15 - I was Made to Love You.avi"
(That's one of my favorite episodes of Buffy the Vampire Slayer, by the way. Particularly poignant, while retaining the nutty brand of humor that show was famous for.)
So there you go. Now you can keep your DVDs safe from scratches, while allowing far more convenient viewing than that ridiculously clumsy medium was ever designed for -- the best of both worlds.
(Note that the following examples work on Linux, and might work on Mac OSX, but will work on MSWindows only if you have installed the Gnu Tools and even then will need tweaking.)
First, I use mplayer to rip the file, for example:
mplayer dvd://1 -dumpstream -dumpfile "video.vob"(to rip the first track as "video.vob" -- there may be dozens of tracks on the DVD.)
Next I encode that as mpeg4 video. It is slightly complicated by the fact that DVDs store videos as narrower images then expect the player to stretch them horizontally upon playing them. This requires that the final width be stored within the file structure. However this is a problem because after re-encoding the video, that data will be lost. There is an open format header that lets you explicitly record the intended width and height, but not all video players use it, so the simplest way is simply to stretch the video to the correct final width while encoding it. This is tricky. I do it like this:
width=`mplayer "video.vob" -benchmark -nosound -vo null -endpos 1 2>/dev/null | grep '=>' | cut -d' ' -f5 | cut -dx -f1`
mencoder "video.vob" -ovc lavc -lavcopts vcodec=mpeg4:vbitrate=1800:v4mv:mbd=2:trell -aid 128 -nosub -vf scale=$width:576,hqdn3d=2:1:2 -audio-delay -0.2 -oac mp3lame -lameopts vbr=3 -o "video.avi"The first line gets mplayer to display information about the file, search for the line containing "=>" which tells about the view size, then cut up the result to get the intended width value, which gets stored in the $width variable.
The second line uses mplayer's sister program mencoder to encode the video to a more efficient form while scaling it to the appropriate width and a height of 576 (very few videos use any other height).
You could, of course, use just the second line, substituting whatever size you want to scale it to. For shows that are just talking I commonly use something like scale=512:288, or scale=384:288 because the image quality doesn't matter if all you're interested in is the witty dialogue.
A bit-rate of 1800 is fine for most things, but old black and white movies can manage with maybe 1200, whereas modern movies with a lot of action may need higher rates as high as 2200 or even higher. Note that higher bitrates increase the filesize. I should also mention that the example uses the audio ID of 128 for the soundtrack (128 is most common for English soundtracks). There are typically anywhere from one to several soundtracks, and any audio commentary, if it exists will be one of them). You can see what audio is on a DVD video by typing into a terminal:
mplayer dvd://1(for video track 1 -- there may be several video tracks on the DVD -- it will play the movie and at the same time print up info about the track in the terminal where you issued the command.)
Or, for example:
mplayer dvd://8 -aid 132(lets you experiment by playing the 8th track on the DVD with audio ID 132)
Now we are getting close to the main reason for this post.
I am going deaf, so I need the subtitles. I rip them like this:
mencoder dvd://1 -ovc copy -oac copy -vobsubout "video" -vobsuboutindex 0 -sid 2 -nosound -o /dev/nullThis produces two files called "video.idx" and "video.sub". And here is the peculiar catch of DVD subtitles. They use a very clumsy method of subtitling video. The first part, the .idx file is an index and is just a text file which tells when each subtitle should be displayed. The second file, the .sub file, is a series of pictures of the subtitles that get overlaid on the movie at the appropriate times. What would have been far more sensible is if they'd used one of the more efficient, open formats, such as .srt subtitles. They are ordinary text, containing both the timing and the text itself.
(If you are wondering how to find the subtitle ID, which is '2' in the example above, get mplayer to tell you:
mplayer dvd://4to see info about the 4th video track on a DVD. Or play it with the desired subtitles:
mplayer dvd://3 -sid 0for DVD track 3 with subtitle 0, which is usually English.)
So, how does one convert DVD subtitles to something much more efficient like SRT format? Well, yesterday I found out about a lovely little program which does the trick. It is called vobsub2srt, logically enough. This extracts the images from the .sub file and runs them through an OCR (optical character recognition) program to work out what the text is. The OCR program that vobsub2srt uses is the free, opensource tesseract, which is brilliantly accurate, but unfortunately doesn't yet learn from its (few) mistakes, so spellchecking the file afterwards is necessary.
And now we come to what occupied me for so long today.
How can I get a music symbol, like ♪ to appear in the subtitles to denote singing?
Well I finally worked it out!
The .srt file needs to be stored in UTF-8 format which allows thousands of possible symbols, and lets me simply paste the appropriate symbol into the text in a text editor. But that is only half the solution. Next I have to get mplayer to display the text as UTF-8. It took me a while, but I finally realised that mplayer's preferences allow choosing subtitle file encoding. Unfortunately, no matter what I chose it still didn't play properly. Eventually I tried typing "UTF-8" (without the quotes) into the encoding gadget (instead of choosing "Unicode" from the drop-down menu), and it worked! Even later, the subtitle file encoding gadget no longer contains "UTF-8", but mplayer still plays them properly. Yay!
Note that in all the examples above you would, of course, not call the file "video.avi", but would use something a little more descriptive, like, for example:
"Buffy - 05-15 - I was Made to Love You.avi"
(That's one of my favorite episodes of Buffy the Vampire Slayer, by the way. Particularly poignant, while retaining the nutty brand of humor that show was famous for.)
So there you go. Now you can keep your DVDs safe from scratches, while allowing far more convenient viewing than that ridiculously clumsy medium was ever designed for -- the best of both worlds.