Making some progress

Even though I haven’t been devoting as much time to it as I either could or should, I’ve made some decent progress on my aggregator. Maybe it isn’t as much as it could be, but it’s enough to make me happy, and since I’m not getting paid, that’s all that matters right now, right?

Let’s see… where to begin? Well, I guess I can list things in no particular order, and let it go from there.

  1. The “biggie” is that there’s now an official name for the project. I thought that I’d be clever, and had been using SpeedReader internally, but then I decided to search for the name, and it turns out that not only is there already a SpeedReader, but it’s also an aggregator. Damn. So, I thought long and hard about it (a few minutes), and decided that, based on my initials and a dearth of hits from Google, that the name maggregate would be good. Well, if not good, at least unique ((I had jokingly originally referred to the project as maggle, a take on raggle, but it seemed a bit too derivative of a name. Sorry Tim, I know you liked this one)). So, that’s what I’m calling it from now on. Maggregate.
  2. A lot of extraneous crashes have been dealt with. I had been adding exception handling in the few places where it’s most likely to occur (network and parsing) on a case-by-case basis, but I kept coming up with new ones. When I took the wireless interface down on the laptop I use to develop code on, and got yet another unhandled exception, I finally decided that adding a “catchall” rescue block to each place that needed it, as those errors were all technically recoverable. Since then, no crashes. Yay. And logging differentiates enough (error for catchall vs warning for explicitly handled) that I can quickly see what I’ve been too lazy to explicitly deal with.
  3. Maggregate now accepts gzip-encoded content! I knew I always wanted to do this, but my prior experiences with trying to deal with raw Zlib (in C) kept me from relishing having to do it. Lucky for me, the ruby library is nice enough that it was almost trivial. In fact, the hardest part was realizing that I needed to wrap the body from the HTTP request in a StringIO object instead of leaving it as a string. Once that was done, it was dead simple.
  4. Partial cookie support is in place. Well, that should really be “preliminary”, as in there’s just a bit of code to recognize cookies, and there’s now a table in the maggregate database to eventually store the cookies. So, the logs let you know that you were sent a cookie (or several), but that’s currently it.

    As an aside, having looked at RFC 2109 and RFC 2965, it appears no one is using them… Instead, it appears every single server out there sending cookies that I’ve come across just uses the Netscape cookie specification. Keep in mind, RFC 2109 dates from February 1997, and RFC 2965 dates from October 2000. Not much of an issues, but I think my jaw will drop the first time I spy a max-age insead of an expires in a cookie header.

  5. Potential support for RFC 3229 with “feed”. This was more of a “I wonder how easy it would be…” deal than anything else. I was reading something, followed a URL in a comment, and one or two clicks later, ended up at that page. For those folks who haven’t memorized the RFC index, RFC 3229 is “Delta encoding in HTTP”, which is basically a way to request a resource over HTTP and get just what has changed on the page instead of having to transfer the entire thing again. The “with feed” is an extension that purports to handle feed deltas in a manner that doesn’t break encapsulation, it just gives you a “feed” with only new/changed articles in it.

    My biggest concern was having to support a new HTTP status code, 226 IM Used, as the Ruby Net::HTTP library doesn’t have support for it. I should have known that it wouldn’t be an issue, thanks to Ruby’s allowing one to extend existing classes fairly easily. How easily? This easily:

    require 'net/http'
    
    module Net
      class HTTPIMUsed < HTTPSuccess        # 226
        HAS_BODY = true
      end
    
      class HTTPResponse
        CODE_TO_OBJ[226] = HTTPIMUsed
      end
    end

    That’s in the maggregate file that does all the network and HTTP connections. That’s it. Now, in maggregate at least, Net::HTTP understands the proper status code for a successful response for RFC 3229. Nice, no?

    Now, I say it’s potential support, because so far not one single feed in over 300 has ever responded with either a 226 response OR the associated additional HTTP headers. But due to the way that I handle adding entries now, I don’t expect any issues, if and when a server decides to respond correctly.

The UI is still completely unusable, as I’ve been verifying things are working through both logging and using the sqlite3 exectuable to examine the database by hand. Despite the number of crashes, I’ve yet to corrupt any data, either. I suppose that the UI is the next real step, so I should finally learn more about ncurses and how to interact with it.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.