09.11.11

RubyInline Tricks and Tips

Posted in Linux, Ruby at 3:31 pm by mike

Permutative Schedule Calculation

One of the most compute intensive tasks in the RSportz system is the scheduling code. It turns out that scheduling pool and cross pool games between a number of teams is actually a rather challenging computer science problem. Add in seeding, home/away venue requirements, and calendar limitations and you have yourself a pretty hairy set of variables to deal with. You end up creating a fairly classic permutation problem. So much so that the algorithm RSportz uses was derived from this paper: U. Schoning. A probabilistic algorithm for k -SAT and constraint satisfaction problems. Proc. 40th FOCS, 1999 (Still looking for a copy of this BTW)

The code was originally written in Java and then directly translated to Ruby by someone who didn’t really know either language that well. The first red flag I found were the hand coded bubble sorts, which were copied directly from the Java version. :(

But the core algorithm was pretty straight forward. Create an NxN matrix where N represents the number of teams. Rows are (arbitrarily) home games and columns are away games. Walk through the matrix and block out all the games that can’t be played due to seeding or other restrictions, then start filling the matrix with games, insuring that the number of home and away games for each team are balanced. Once you have a complete matrix for the number of games each team should play, score that matrix according to the mix of cross-group play to insure a much randomization as possible, while still avoid top seed match ups when possible.

The real world comparison here is the NFL, where each team must play every team in their division twice, and then some mix of cross-division and cross-conference play. While I suspect the NFL primarily uses TV market size data to drive their out of division play, our algorithm wants to make sure every team plays as many different divisions as possible in it’s cross group play.

So we score each complete matrix, and then move on to another combination, until we either reach the maximum possible score, or exhaust all combinations of the matrix or hit a time limit.   Hopefully we have at least one matrix before hitting the time limit. Now those of you familiar with combinatorial math know this numbers get very large very quickly. For instance, for 20 teams playing 5 games each, the number of combinations is “400 choose 50″ or 1.70359003 × 1064 (a really big number). Now we do eliminate some games since we know each team can’t play itself, and there may be seed, home and away game restrictions, but as you can see, it can be a time consuming problem to solve.

So when I first started playing with the system, I was a little shocked to find out that to schedule 5 cross-group games between 4 groups of 10 teams took 3 minutes (basically, it was hitting the built in timeout).     I first walked through the code replacing all the bubble sorts with native ruby array sorting calls, and attempting to reduce unnecessary object copying where possible.     Finally, I realized that the system was finding the best possible score for this scenario in 30 seconds, but was continuing to process until the time limit was exhausted.     So the first thing I did was calculate the highest possible score for the matrix, and then made the algorithm stop when it was hit.  (We still need to go back and consider the speed a perfect score vs. 60-80%.)   That got me down to 30 seconds for the 40 team scenario, but it still seemed way too slow for me, at which point I started investigating how to call out to C code in Ruby.

As an extra motivator, when this problem was described to a friend of mine, he felt compelled to tweet “Re-implementing your O(n!) algorithm in C (from Ruby) probably isn’t going to help”.   I have to thank him for the inspiration.

RubyInline

I cannot say enough good things about RubyInline.  It makes dropping into C code simply trivial since you just place your C code in the middle of your Ruby code.   No Makefiles or compiler options to worry about.    You simply include the gem

require "inline"

And then add your code as a lamda section within a inline do block:

  inline do |builder|
    builder.c '
    static VALUE assign_c_wrapper(VALUE total_teams, VALUE matrix, VALUE team_games, ....) {
       ...
       return ret ? Qtrue : Qfalse;
    }'
  end

The argument to builder.c needs to be a string, so you do need to be careful with your single and double quotes and make sure they are escaped properly if needed in the code. Obviously, anything that returned a string would work here, so the C code could be in it’s own file at this point and returned by an IO method, but that undermines the whole point. A search for RubyInline will show you all sorts of examples.

Although RubyInline will do automatic argument conversion for you with simple data-types, I was dealing with 2D arrays so I chose not to take that route and converted them myself, hence the reason all my arguments are VALUEs, which are essentially just pointers to Ruby objects.

Now the folks at ZenSpider have taken this one step further and have another gem called RubyToC, which simply translates your Ruby code directly to C and then compiles it via RubyInline. Quite clever, but once again, only the simplest data types are supported.

A couple bookmarks that are handy to have while your writing the code: One is the Extending Ruby chapter from the Programming Ruby guide. This has a nice table which lays out the type conversions. The other one is the Yukihiro’s original README.ext which is in the Ruby source distribution, but is also kept online and nicely formatted in HTML by eqqon.com

So the code to convert a 2 dimensional array from Ruby to C ends up looking like this:

      // matrix is the Ruby NxN integer array.  Created with the code
      //  matrix=Array.new(total_teams) { Array.new(total_teams, 0) }

       char c_matrix[c_total_teams][c_total_teams];
       VALUE *matrix_row = RARRAY_PTR(matrix);
        for (i = 0; i < c_total_teams; i++) {
           VALUE *matrix_col = RARRAY_PTR(matrix_row[i]);
           for (j = 0; j < c_total_teams; j++) {
             c_matrix[i][j] = FIX2INT(matrix_col[j]);
           }
         }

The inline builder block also takes some useful arguments. Compiler flags when needed as well as a “prefix” section which basically get treated as a C include file for the block of code. Being an old C pre-processor hack, I found this very handy. More on that later.

inline(:C) do |builder|
       builder.add_compile_flags '-ggdb'
       builder.prefix '
// We need to play games in order to pass around 2D arrays of variable size
// Luckily, we know the size of the second dimension
#define MATRIX(_i, _j) matrix[((_i) * total_teams) + (_j)]

#define TPG_FACTOR 1.2
#define GPG_FACTOR 1.1
#define GPT_FACTOR 1
...

The one gotcha was the line numbering in gdb was off. RubyInline tries to add a #line directive to the C code it generates in your home directory: ~/.ruby_inline, but they are off. I tried to adjust them in the prefix section, but didn’t have much luck, but I was able to use gdb to step through my code after adjusting the line numbers manually.

16-156x Performance Improvement

So most users of Ruby Inline report a 10x performance improvements, but even given the factorial nature of my algorithm, I strongly suspected we could do even better. As I mentioned before, the code was originally written in Java, and I actually don’t expect you would see a significant performance increase between Java and C, but the algorithm was a worst case scenario for a duck typing language with no native datatypes like Ruby. For instance, take the following lines directly from the exiting ruby code:

    matrix=Array.new(total_teams) { Array.new(total_teams, 0) }

Seems simple enough, all I want is an NxN array of integers, which each value initialized to 0. But in Ruby, just because they are integers a this moment in time, doesn’t mean someone can’t insert an ActiveRecord object into the middle of this matrix sometime in the future. So in this case we end up asking the memory management system for N arrays and N*N integer objects and initialize each one to 0. Now for the same code in C:

    char matrix[total_teams][total_teams];
    bzero(matrix, sizeof(matrix));

Optimized by the compiler, literally three instructions to multiply the values, bump the stack pointer, and then zero the memory range. And my Intel ASM knowledge is 15yrs old. I wouldn’t be surprised to find there’s now one instruction to do all of the above.

Of course, we don’t allocate the matrix O(n!) times, but we do execute the following harmless looking operation:

matrix[row][col] += 1

Once again, quite a bit of work in a duck type language where the interpreter has no idea what type of object is at that position in the matrix, while in C

matrix[row][col]++

Is once again optimized down to 1 instruction.

By the Numbers

So my test case (spec of course) was pretty simple.    It schedules games for a 4 group round robin league with 10 teams in each group.     Each group played 6 games within it’s own group and 4 cross group games.   Teams were seeded such that the top 20 seeded teams cannot play each other until the playoff start.    The algorithm is recursive by nature, so I counted the number of recursions and the number of times we scored a valid matrix and sent that to the debug log each time a better scoring matrix was found. In this case, I let the code run until the timeout was hit, so we could better compare the speed of the code.

# Games #Teams Recursions Maxtrices Tested Time
Ruby 30 10 662,063 2,521 16s
C 30 10 662,934 3,392 1s
Ruby 80 40 5,313,768 22,495 180s
C 80 40 138,884,345 3,508,954 180s

Note that in the 80 game case, we still hit the time limit before we reached the best possible score for the matrix, while in the case of the smaller ones, we found the base matrix prior to hitting the time limit. We do have some knowledge of which teams are going to be the most difficult to schedule, hence we can order the matrix to try those teams first, so the longer the algorithm runs, the less likely it is we’ll find a better scoring matrix. Since I was only measuring to second resolution, the 30 game case clearly showed a 16x performance improvement. But for the 80 game case, the C code was able to test 156 times more matrices than the Ruby code. Have to say I’m pretty happy with the results, though I’ve since done some further optimizations on our scoring algorithm, and deciding when a game matrix is “good enough”.

I suspect my theory-favoring friend will argue that a sample set of 1 doesn’t disprove his assertion, but my recommendation for optimizing code has been repeated by many others: profile, profile, profile.

Misc Tricks

The one thing that irritated me while developing the C is I found myself wanting a DEBUG wrapper which I’ve written many time before but didn’t have on hand. Since I had to recreate the macro from scratch, I figured I’d include the version here for myself and anyone else who needs it. In this case, I wanted to call the debug method on the Ruby logger from within C code, and make it easy enough to insert debug messages in the C code. I’ve been accused of abusing the C pre-processor in the past, so your mileage may vary:

#define DEBUG_LOG(_format, ...) \
    do { \
      VALUE _logger = rb_iv_get(self, "@logger"); \
      ID _debug = rb_intern("debug"); \
      char _message[2048]; \
      snprintf(_message, sizeof(_message), _format, ##__VA_ARGS__); \
      _message[sizeof(_message) - 1] = 0; \
      rb_funcall(_logger, _debug, 1, rb_str_new2(_message));\
    } while(0)

The “do {} while(0)” is an old trick to avoid having to worry about nested blocks or semi-colons. The use of the code is as follows:

      DEBUG_LOG("SCHED: assign(%ld) evaluate score increased from %f to %f after %ld"
                " attempts! teams: %d, games: %d, groups %d, %ld seconds elapsed",
                GET_MEMBER_LONG("@assign_calls"), best_score,  score, GET_MEMBER_LONG("@evaluation_attempts"),
                total_teams, games_scheduled, ngroups, time(0) - start_time);

Finally, I also found the following macros handy (all this was included in my builder.prefix section):

#define GET_MEMBER_LONG(_name) FIX2LONG(rb_iv_get(self, _name))
#define GET_MEMBER_DBL(_name) NUM2DBL(rb_iv_get(self, _name))
#define INCR_MEMBER(_name)  rb_iv_set(self, _name, LONG2FIX(GET_MEMBER_LONG(_name) + 1))

Finally, if you ever forget the syntax for passing a multidimensional array in C, it looks something like this:

     static int verify(char team_games[][2], int row, int col, int ngames, int max_games) {

You need to declare N-1 of the dimensions, or pass them in as variables and then do the indexing yourself.

04.16.11

RadioParadise HD Plugin for Windows Media Center

Posted in Home Theater, Windows at 2:04 pm by mike

So I’ve become a big fan of RadioParadise.    It’s a listener supported, commercial free radio station which plays a great mix of new, old and eclectic rock, with a random mix of everything else when they feel like it.  Actually, it’s best explained if you just go there and listen.  I also have a shortcut on my phone and it’s the only music I listen to in the car now.

So I’m a fan of the music, but the cool thing they’ve added this year is an HTML 5 192K HD feed along with a photo slide show called RadioParadise HD.    The photo’s are all high resolution, meant to seen on the big screen, but the really cool thing is that they are uploaded by the community, so if you have some high quality 16×9 photos, you can upload them and potentially see your own pics there.

So, obviously, a Media Center Plugin is needed so you can use your remote to bring up the music and slide show.    A couple things you’ll need.

  1. IE9:   Since the HD player is implemented in HTML 5, you’ll need IE 9, Firefox or Chrome.    I tried all three and (surprisingly) had the best experience with IE 9 as far as running in kiosk mode and resizing correctly.    Chrome has security issues being launched from WMC and FF seems to crash after running the feed for a few hours.
  2. Autohotkey: An extremely cool and easy to use scripting utility.   Used to turn off the screen saver and hide the mouse.
  3. nomousy: A utility from the autohotkey community to hide and restore the mouse upon exit
  4. Media Center Studio: To build the plugin.

First, install Autohotkey and cut & paste the following script into a file called rplaunch.ahk (or you can just download my pre-built binary from here):

; Disable Screen Saver
DllCall(“SystemParametersInfo”, Int,17, Int,0, UInt,NULL, Int,2)
; Hide the mouse
Run, C:\bin\nomousy.exe /hide
; Run IE in kiosk mode pointing to the rphd stream
RunWait, “C:\Program Files\Internet Explorer\iexplore.exe” -k http://radioparadise.com/rphd.php
; Show the mouse
Run, C:\bin\nomousy.exe
; Enable Screen Saver
DllCall(“SystemParametersInfo”, Int,17, Int,1, UInt,NULL, Int,2)

Obviously, you should change the file to point to where you installed nomousy, or just install it in C:\bin as I did.   Then just right click on the .ahk file and select “Compile”.    You should now have an rplaunch.exe binary.   Put this in C:\bin as well.

Now run Media Center Studio.    Warning, the UI here is a little obtuse, so just follow these steps:

  1. Once you start the app, click on the “Start Menu” icon on the main toolbar (my version has a blank icon)
  2. Now click on the Entry points expansion button in the lower left hand corner
  3. Now click on the “Start Menu” tab at the top, and you should see something like this:
  4. Now Click the “Application” icon, and fill it out as follows.   Note I put my rplaunch.exe in a location with no spaces in the directory names.    I can’t swear that a location with a space doesn’t work, but it was on of the variables I eliminated during my testing.
  5. To get the Back and MediaStop buttons to exit the app for you, press the green “+” button, and then press the keys on your keyboard/remote:
  6. Hit the disk icon in the upper left (Save), close the tab and you should be returned to the Start Menu.   The new app should show up in the Entry points list.
  7. Drag and Drop your new app from the Entry Points to the location on the Start Menu you desire.   Hint: The TV and Movies row is not editable by default, so put this in the Music row, or go read this thread.
  8. Hit Save again and restart Media Center.

BTW:  Here’s the icon I used as well.

If anyone is willing to package this all up into an installable (or even give me instructions) I’d be happy to provide a download site.

Enjoy.

03.10.11

Ubuntu 10.10 VNC keyboard mapping nightmare.

Posted in Linux at 10:38 am by mike

My Amazon EC2 dev box crashed yesterday and I had to rebuild it from scratch. Installed Ubuntu 10.10, configured VNC and ran into a keyboard mapping problem. Every time I hit the letter “d”, all my windows would iconify. I went down many false paths as did this poor soul, applying solutions from old releases.   I was finally able to determine the problem was actually the window manager, in this case it was metacity.    I didn’t bother to try to change the window manager in the gnome settings, because they didn’t appear to be running correctly either.    The simple solution was to just specify fvwm in .vnc/xstartup:

#!/bin/sh

xrdb $HOME/.Xresources
xsetroot -solid grey
vncconfig -iconic &
#x-terminal-emulator -geometry 80x24+10+10 -ls -title "$VNCDESKTOP Desktop" &
#x-window-manager &
# Fix to make GNOME work (doesn't do shat for me)
export XKL_XMODMAP_DISABLE=1
gnome-panel &
fvwm

Hopefully this will save someone else the time I wasted on this.

02.02.11

Replacing the ATI Radeon 5450 with an NVidia GT 430

Posted in Home Theater, Intel, Windows at 11:11 am by mike

So I had reached the end of my rope with the MSI ATI Radeon card on a couple fronts. I had HDMI bitstreaming working with Arcsoft TMT (as long as AnyDVD was running), and I could even put up with the flaky Catalyst UI and drivers. But I was stymied trying to get the refresh rate set to 23.976, which the Kuro PRO-141FD will execute a 3:3 pulldown for picture perfect Blu-Ray playback at 71.928.   Given the cost of the graphics card vs the 60″ plasma, it was time to make a change.

So with $75 of Amazon coupons burning a hole in my pocket, I decided to give the Zotac NVidia GT430 a shot. Since it goes in the HTPC next to the TV, silent cooling was a must.   Gaming performance is a non-issue for me, and since the Kuro isn’t getting replaced anytime soon, 3D video support wasn’t important either, though the Arcsoft BD & 3D assistant gave the card a thumbs up on all accounts.

Upon opening the package, the Zotac NVidia card looks much bigger than the MSI ATI with the giant heat sink, but both cards take up two slots in the machine.  The Zotac actually has two brackets, which make it a nice secure installation. Also, the Zotac has an DVI, HDMI and DisplayPort connections, while the MSI had VGA, DVI and HDMI. What a difference 9 months makes.

MSI ATI Radeon HD5450

MSI ATI Radeon HD5450

Zotac ZONE GeForce GT430

Zotac ZONE GeForce GT430

Installation was painless, it was time to download the NVidia drivers. A couple things impressed me right off the bat:

  • Clean install option – The NVidia drivers will blow away all previous NVidia registry settings and configuration when checking this box.   Very nice when you’ve mucked around with one too many registry hacks.
  • No “crap-ware” in install.   Thank you, I don’t need a 2 week trial to LOTR online…
  • Windows performance index:   ATI 5450: 4.9,  NVidia 430: 6.7!   Very impressive for a fanless card still less than $100.

So the next thing to try was getting to 1920x1080p@23.976.     First of all, the NVidia Control Panel was so much easier to navigate than even the ATI Catalyst Beta (the old ATI UI was horrid.  The latest is bearable).   From there, getting to 23Hz couldn’t of been easier.     Although it’s not listed in the defaults, click Custom, and 23p, as well as 59p, are at the top of the list.

No need to dig into the “Create Custom Resolution” dialog (but I wish the ATI UI had that!)

So that was too easy. Hmm… what about my other Blu-Ray playback issues? While I’ve had HD bitstreaming working with the ATI card for a while, I’ve had two other problems with the Arcsoft TMT software. First off, for some reason the TMT player refuses to play ANY BD disc. No explanation given, and all the HDCP tests come out fine. This started happening with their 3.0.1-170 release, and continues through 5.0.1.87. The only fix I’ve found is to install Slysoft AnyDVD .   Unfortunately, the problem wasn’t the ATI card in this case, and I still need AnyDVD to watch BD.    Not the end of the world since I already own it, but a little disappointing since the software has questionable DMCA legal status in the US.  (BTW: If you haven’t figured it out by now, do NOT use your HTPC as your sole Blu-Ray player, unless you want to spend twice the money for twice the headaches.)

The second problem I’ve encountered with TMT is during BD playback (with bitstreaming) the audio will get out of sync if I decide to pause, rew or fastfwd.   Fairly irritating.   Luckily, Arcsoft has created a hotfix for ATI cards if you encounter this problem, and that seemed to work, though I needed to re-apply it on the latest  version.    Now for the Nvidia… Change refresh rate to 23.967, pop in the Inception BD, press play and wait for the DTS-HD MSTR display on the SC-07…   DTS!?!?!   WTF!!!!!   Arrrggghhh!!!   After getting this far, I’m no longer bitstreaming the uncompressed HD audio track!

OK.  Off to Arcsoft Forums to see if anyone else is experiencing this.     Found one guy from back in Dec, but it’s not clear he knows what he’s doing….    Post my problem…. Next day check the forum (no email subscription!?).  Hmm… it seems Arcsoft has only certified the 260.99 driver, while I had downloaded 266.58.    Back to the NVidia site, archived drivers, 260.99, download.    Remove 266, install 260 (with the clean install option), reboot, play BD…. WHOOO HOOO!!! DTS-HD MSTR is back!    AND no problem with pause, ff, rew, etc all at true 1080p24!

So one thing I noticed is the 260 “Clean Install” check box didn’t do such a great job.   Even though I had removed the old driver and rebooted before installing, I was still prompted by numerous “Newer File Exists” messages during the install.  Furthermore more, the nice list of resolutions you see above all showed up blank with the older driver, but Windows Monitor properties still said I had 59Hz and 23Hz available .      I should probably go back to a restore point prior to installing 266 and then install the 260 version again but it’s working the way I want, so I’m not sweating it for now.   I may just wait for Arcsoft to support the 266 drivers, and then upgrade again.

So while not perfect, the NVidia still wins the day.     Time to not touch it if it ain’t broke.   We’ll see how long that lasts!  :)

01.23.11

Hulu update and quick Media Center Studio fix

Posted in Windows at 2:40 pm by mike

I just received a comment on an old Hulu/AutoHotKey fix I had done sometime ago to get a better resolution for Hulu streaming.    Seeing this, it occurred to me that since then, Microsoft made some changes that broke Media Center Studio.   Searching the  Australian Media Center Community (which has a couple interesting projects you won’t find on TGB) there are a number of work-arounds suggested, but I found this one the easiest to implement:

Edit C:\ProgramData\Microsoft\eHome\Packages\MCEClientUX\dSM\StartResources.dll with a binary editor (gvim) and replace these two references to dSM:

xmlns:Movies = “data://dSM!SM.Movies.xml”
xmlns:TV = “data://dSM!SM.TV.xml”

with

xmlns:Movies = “data://ehres!SM.Movies.xml”
xmlns:TV = “data://ehres!SM.TV.xml”


This should work fine until Microsoft update replaces the DLL, then you just need to make the change again. If editing a binary file is a little too much for you, you’re welcome to try my modified version, though your mileage may vary.   If Windows Update changes the file, shoot me a note and I’ll update the DLL on my site.

Finally, the reader was also nice enough to include two Hulu images to use for creating the icons.

11.20.10

Windows 7 Performance Tricks

Posted in Windows at 5:08 pm by mike

Just stumbled across this blog posting.    Pretty good stuff.

11.19.10

Windows 7 Sleep Debugging

Posted in Home Theater, Windows at 10:29 pm by mike

So after I added the SSD to the HTPC and moved my hard disk into the  NAS drive, I began to notice the machine no longer automatically goes to sleep.    If I hit the power button (set to sleep) or run it from the command line (%windir%\System32\rundll32.exe powrprof.dll,SetSuspendState Standby) the machine goes to sleep just fine, but won’t do so automatically.

Mucking with powercfg (which is a very interesting program BTW.  Run powercfg /? if you’re not familiar with it) I found the following:

powercfg -requests

[DRIVER] \FileSystem\rdbss
A file has been opened across the network. File name: [\medianas\Photos\blah\blah\blah\IMG_9088.jpg] Process ID: [4456]
[DRIVER] \FileSystem\rdbss
A file has been opened across the network. File name: [\medianas\Photos\blah\blah\SANY0014.jpg] Process ID: [4456]
[DRIVER] \FileSystem\rdbss
A file has been opened across the network. File name: [\medianas\Photos\blah\blah\blah\IMG_9160.jpg] Process ID: [4456]
[DRIVER] \FileSystem\rdbss
A file has been opened across the network. File name: [\medianas\Photos\blah\blah\SANY0118.jpg] Process ID: [4456]
[DRIVER] \FileSystem\rdbss
A file has been opened across the network. File name: [\medianas\Photos\blah\blah\IMG_0053.jpg] Process ID: [4456]
[DRIVER] \FileSystem\rdbss
A file has been opened across the network. File name: [\medianas\Photos\blah\blah\PC060102.jpg] Process ID: [4456]
….

So basically, the Media Center Screen saver is actually preventing the system from going to sleep because it has all these photo files open across the network.    I really like the screen saver, and I don’t want to suck up space on the SSD with photos.   What to do?

It turns out, there’s a hidden setting in powercfg which will allow to enable sleep when files are open over the network.   I assume the option is hidden by default because most folks wouldn’t understand what “remote opens” were.     There’s a very helpful blog post here which explains a number of hidden power/sleep settings in gory detail.   Low and behold after importing the registry file (run regedit as administrator):

Hidden option in Windows Power Config

Still need to do a little more testing but I figure this should do the trick.   Just a couple more handy commands for my own reference:

Put the computer to sleep from the command line

%windir%\System32\rundll32.exe powrprof.dll,SetSuspendState Standby

What was the last reason the PC woke up from sleep

powercfg -lastwake

List all the requests from processes to prevent the computer from sleeping:

powercfg -requests

Analyze any power usage issues the system might have

powercfg -energy

See “powercfg /?” for more info.

10.29.10

Jumbo Frames, iSCSI and Disabling Nagle

Posted in Home Theater, Windows at 12:47 pm by mike

So the new project (which I get to spend about 15 minutes a week on) has been to remove the mechanical harddisk from my HTPC and have it run completely silently off the SSD.   The first stage was purchasing the ReadyNAS NV+.  2TB drives have dropped below $100 so a NAS device with 1Gb networking was the no brainer solution.

Given I have two ATI CableCARD tuners (which BTW are no longer being made or supported by ATI), I needed to make sure the network and NAS had enough bandwidth and performance to write two simultaneous HD streams, while reading a third.  Now in theory, with a 1Gb network, you should be able to get roughly 100MB thoughput, unfortunately the real world doesn’t work that way.

My initial testing using Lan Speed Test (more on that later) showed I was getting around 24MBps writes and 45MBps reads.  (Remember big Bs are Bytes and little bs are bits.  8 bits to a byte)  So looking at my Comcast HD recordings, it turns out Fox is broadcasting 1920×1080 MPEG2 at around 14Mbps.   This isn’t that bad.  Note that DirectTV uses MPEG4 at around 5Mbps.  Since mp4 delivers better quality at lower bit rates,  the DirectTV advertisements aren’t lying when they say the picture is better.   For comparison, a typical DVD is MPEG2 @ 9.8Mbs and a BlueRay is MPEG4 at 40Mbps.   So if you haven’t figured it out yet, HD can really mean anything you want.

But the answer is if you want to record 2 HD TV channels from Comcast, you need about 2MBps per stream, or really just 4MBs, but you definitely want to have some head room, and there’s also the need to play DVD images back from the drive

I was hoping to get a little better than 25% of maximum throughput so started looking for solutions.    The first thing I poked at was the Nagle’s algorithm attributes.    This basically tells the TCP stack to gather all the small requests in to one big one before sending.     Turns out gamers want to disable this feature to make sure every keystroke/click is sent the moment they enter it, and not let things batch up on the network card.    For media streaming, you want the opposite behavior, but you’re rarely sending small data packets anyways.    But just for kicks, I set the following registry settings

HKLM/SYSTEM/CurrentControlSet/services/Tcpip/Parameters/Interfaces/{nicid}/

GlobalMaxTcpWindowSize = 0×01400000 (DWORD)
TcpWindowSize = 0×01400000 (DWORD)
Tcp1323Opts = 3 (DWORD)
SackOpts = 1 (DWORD)
TcpAckFrequency = 4 (DWORD)
TcpDelAckTicks = 2 (DWORD)

And ran this command:

C:\ netsh interface tcp set global rss=disabled chimney=disabled autotuninglevel=disabled congestionprovider=none

Mostly from recommendations from this iSCSI site (more on that below):

Basically I’m saying the opposite.   Package up as many small bits as possible into larger ones to avoid the overhead.  Difficult to measure the differences here, but I’m recording them here for myself in case I run into problems.   Though, it turns out there is another TCP feature along the same lines that does help with streaming media and was much more noticeable.

I had already done all the cache optimizations possible on the ReadyNAS, and configured it for Raid 0 since I really don’t need redundancy for recorded TV shows and I’m using Amazon S3 for offsite backup as described here.   One of the features I found on the ReadyNAS was support for TCP Jumbo Frames.   So it turns out the standards the Internet still runs on today were defined over 30 years ago.   Given the reliability of Ethernet at the time, the designers decided that 1500 bytes was the largest amount of data to be communicated in each packet so if the receiver didn’t get the packet, the resend wouldn’t be so large.    In today’s home gigabit switched networks, collisions and data corruption are almost unheard of.  So rather than waste all the CPU & interrupt time splitting and joining small packets, you just build one big one.   This is also a bit more efficient because each packet also requires header and footer  information describing where it should be delivered to.   Unfortunately, because everyone has to follow a standard, the largest the Jumbo Frame packet goes to is 9K, but that’s still almost a 4x increase in the data delivered with the same header and footer used for the original frame size.

So I started looking at the configuration for the on-board network adapter on my Intel P35.  No Jumbo Frame option, but this I found this note:

Note: The Intel PRO/1000 PL Network Connection supports jumbo frames in Microsoft* Windows* operating systems only when Intel® PROSet for Windows Device Manager is installed.

Cool!  So Installed Intel ProWin and still couldn’t find the JF option.   Do a little more research and find this:

The following gigabit LAN components included with Intel® Desktop Boards do not support jumbo frames:

  • Intel® 82566DM Gigabit Ethernet Controller
  • Intel® 82566DC Gigabit Ethernet Controller

Eeeekk! My motherboard chipset doesn’t support jumbo frames!    So it was off to Amazon to see how much a 1Gb PCIe Network card with JF support would set me back.    Since 1GB is not longer the bleeding edge (they now have 10Gb NIC over Cat6),  this Startect card was just under $25.    This also allowed me to do some real world performance testing between the two use Lan Speed Test:

HW

Intel 82566DC

StarTech ST1000SPEX

SMB Read MB/s

48.05

45.8

SMB Write MB/s

24.28

33.43

So I’m pleased with the 30% speed improvement on write.   I read somewhere that JFs aren’t used for reads, hence there wasn’t any significant difference there.    So I’m all set right?   Oops, wait a minute.   It turns out that Windows Media Center won’t record to a network drive.    This is part of the DRM associated with CableCARD, which I’ve ranted about before.   It turns out the new ATI BIOS relaxed OCUR standards addressed most my CableCARD concerns.  This left two possible solutions:

  1. Record to the SSD and use DVRMSToolbox to move the recordings to the NAS (after commercial detection)
  2. Use iSCSI rather than SMB (Microsoft File Sharing)

Once again, going back to 30 years, there were a couple competing standards for attaching disk drives to computers.    Once of these was called SCSI and was championed by Apple and Sun (as well as many others).    SCSI had a bit high level command structure and some interesting chaining features that are similar to today’s USB features.   PCs meanwhile were using IDE interfaces which have evolved in their own direction.    Fast forward 30 years, and have these network cables which are now as fast as those big thick SCSI cables, so why not send the SCSI protocol over that?   Now you have iSCSI.

So the cool part is, you use iSCSI, and Windows thinks the drive is a local SCSI drive, not a remote NAS drive.     Of course, since I bought the cheaper ReadyNAS NV+ rather than the latest and greatest ReadyNAS Ultra, iSCSI support was not yet built in.   Enter the OpenSource world to the rescue.    Since the ReadyNAS NV+ is basically a little Sparc machine running Linux, Stephan at http://whocares.de/ ported the Linux iSCSI Target daemon.    If you go this route, be sure to check out his support page which was a little tricky to find.    In short, the original directions pointed you to the wrong config file, as I explain here:

Downloaded and installed 1.4.20.2. After following the instructions verbatim, I realized my target was not being created and spent a lot of time searching the net for the cause of this message:
iscsi_trgt: iscsi_target_create(131) The length of the target name is zero

I finally came back here and read all the comments. The problem the whole time was ietd.conf needs to be /etc/ietd, not /etc like the instructions say. :-(

Hopefully google will find this comment for the next guy who comes along.

I mention a couple other quirks on the support page, but the above is the only one that matters.   So back to performance testing via Lan Speed Test:

Protocol

iSCSI

SMB

Read MB/s

21.05

45.8

Write MB/s

191.28

33.43

Whoa, check that out!    More than 6x improvement in write performance!    In fact, it now writes almost twice as fast as the theoretical network maximum… umm… wait-a-second….   That’s probably not right…

A little more investigation showed that because Windows considers it a SCSI drive, there’s lots of local caching going on which was fooling Lan Speed Test.    Using Blackmagic’s Disk Speed Test (which also won’t work on a network drive), write speeds were around 14MB/s.    So the freeware implementation of iSCSI leaves a bit to be desired performance wise.    There may be some other things you could do via direct device access and later versions of iSCSI, but I decided to go back to the DVRMSToolbox solution.

So I’m actually pretty happy with the current solution where I record to a temporary directory on the SSD and then move the file over to the NAS.   This also allows the Dragon Global Showanalyzer to work on the files locally rather than scanning them over the network.

HW

Intel 82566DC

StarTech ST1000SPEX

SMB Read MB/s

48.05

45.8

SMBWrite MB/s

24.28

33.43

09.27.10

Intel CPU Info tool.

Posted in Uncategorized at 6:09 pm by mike

Cool geekware to tell you about your CPU: http://www.cpuid.com/downloads/cpu-z/1.55-setup-en.exe. Be sure to disable the evil Ask toolbar window from the install wizard.

09.26.10

Open Apple Pages .pages files in Windows

Posted in Windows at 11:11 am by mike

Quick fix for Grammie.   Someone had sent her some .pages files and she couldn’t open them on Windows 7.    Turns out that they are a Winzip compatible file archive with a PDF preview in them.      Since she doesn’t even have Winzip installed, I simply exported, edit, and imported the .zip registry entry so the Windows File Explorer would do the trick.       It ended up looking like this:

Windows Registry Editor Version 5.00
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\.pages]
"PerceivedType"="compressed"
"Content Type"="application/x-zip-compressed"
@="CompressedFolder"
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\.pages\CompressedFolder]
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\.pages\CompressedFolder\ShellNew]
"Data"=hex:50,4b,05,06,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00
"ItemName"=hex(2):40,00,25,00,53,00,79,00,73,00,74,00,65,00,6d,00,52,00,6f,00,\
  6f,00,74,00,25,00,5c,00,73,00,79,00,73,00,74,00,65,00,6d,00,33,00,32,00,5c,\
  00,7a,00,69,00,70,00,66,00,6c,00,64,00,72,00,2e,00,64,00,6c,00,6c,00,2c,00,\
  2d,00,31,00,30,00,31,00,39,00,34,00,00,00
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\.pages\OpenWithProgids]
"CompressedFolder"=""
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\.pages\PersistentHandler]
@="{098f2470-bae0-11cd-b579-08002b30bfeb}"

I’m not sure if all those GUID and Shell data values are universal, but if importing the above doesn’t work, I’m sure you can figure out the same export/edit trick I did.  Note you need to logout and login again to restart the base explorer shell.   Enjoy.

« Previous entries Next Page » Next Page »