|
Links, links, links A week ago I wrote about hyper cross-linking. I've done the groundwork for implementing it in my aggregator, but unfortunately I've found the ground to be quite muddy. (BTW: doing this was not just an excuse to stop working on the user interface of the aggregator. It wasn't! Just because I like doing back-end work about a million times as much as I like doing user interfaces, that doesn't mean....)
Ahem! Where was I? Oh, yeah... the mud!
The basic idea was to have the aggregator retrieve the HTML for the articles and parse out the links. The problem with this idea is that despite the fact that all RSS feeds provide an HTTP link there is really no reliable way to get at the HTML for a specific article. The links in RSS feeds almost always go to pages with much more than just the article. The page may contain multiple articles. Usually, but not always, the presence of a "#" character in the URL is indicative of this, in which case I could write code to find the beginning of the article... but I would have no idea where the end is. Sure, I'd find the links, but I wouldn't really know what specific articles they're coming from, and I'll end up duplicating a lot of links. That wouldn't be a horrible problem if I were just doing this for myself, but it's not good enough for my real intent.
The idea was to be able to share the link informatioin (via RSS) with other users of the aggregator. That way we could all exchange link information with each other, and if you pulled in a feed from someone who links to my blog, I would find out about it from the link you send me even though I don't happen to read that blogger. But with lots of duplicate links being exchanged as a result of the inability to isolate the start and end of the actual article text, this service loses a lot of its value.
Back to the drawing board.
|