My script tells you which LJ users you share the most interests with.
#!/usr/bin/perl -w
$L=10; die " usage: ./lj_interesting.pl ljusername
This script will show you the $L LiveJournal users whom you share the most
interests with, and tell you what those interests are. It doesn't work as
well for really popular interests, as livejournal only shows the first 500
users with a given interest, but the less common interests are probably
more interesting anyway. This script is released in the public domain by
the author, phyxeld. (notcopyright) 2003\n" unless $ARGV[0]; crawl(@ARGV);
sub crawl {
my $user=shift;
warn "Searching for users who share interests with $user ...\n";
my @int=scrape($user,0); my $t=$#int+1;
for $int (@int) {
printf STDERR (qq"\x0d\x1b[K %3d%% [%-30s] request %d ".
q[of %d "%s"], (++$i/$t*100),('*'x($i/$t*30)),$i,$t,$int);
push @{$d{$_}},$int for (scrape($int,1));
}
print "\nUser '$_' has ",$#{$d{$_}}+1," common interests:\n ",
join(', ',@{$d{$_}}),"\n" for (grep { !m/$user/ && 0<$L--}
sort {scalar @{$d{$b}}<=>scalar @{$d{$a}}} keys %d)
}
sub scrape { # scrape can do two things: @interests = scrape(user,0)
my ($q,$w)=@_; my @r=(); # @users = scrape(interest,1)
$re=$w?qr[rinfo\.bml\?user=([\w+]+)']:qr[sts\.bml\?int=([\w+]+)'];
my $u=$w?'interests.bml?int=':'userinfo.bml?user='; push @r, (m/$re/g)
for qx[curl "http://www.livejournal.com/$u$q" 2>/dev/null]; return @r;
}Naturally, the more interests you have listed, the better your results will be... and the longer the script will take to run (so a visual progress bar is displayed while the script works). The added load on LJ could potentially get heavy if a lot of people ran this; there is one HTTP request sent out for each of your interests, plus one for your userinfo page. If this becomes a problem for LJ (I doubt it will), then I'll take it down. Due to my extreme laziness, I'm fetching the pages with curl, so you'll need that installed if you want to try this out. (Or replace the single occurance of the word curl in the source with wget -O -)
If anyone in
$ lj_interesting.pl jwz Searching for users who share interests with jwz ... 100% [******************************] request 69 of 69 "xemacs" User 'jw_izz' has 28 common interests: brassy, cabaret+voltaire, cop+shoot+cop, cyber+fashion, cypherpunk, die+warzau, dna+lounge, emacs, emergency+broadcast+network, frank+miller, hanzel+und+gretyl, harlan+ellison, internet+radio, john+varley, jwz, killing+the+riaa, low+pop+suicide, monkey+butter, psytrance, retrocomputing, schadenfreude, screen+savers, shriekback, surveilance, the+singularity, vernor+vinge, waxtrax, webcasting User 'blackavar' has 17 common interests: cabaret+voltaire, cyber+fashion, cypherpunk, die+warzau, dna+lounge, emacs, frank+miller, internet+radio, jwz, killing+the+riaa, retrocomputing, schadenfreude, screen+savers, shriekback, vernor+vinge, waxtrax, xemacs User 'confuseme' has 14 common interests: autechre, blade+runner, cabaret+voltaire, cyberpunk, cypherpunk, drum+and+bass, front+242, hacking, killing+the+riaa, lain, psytrance, shriekback, transmetropolitan, william+gibson User 'ivorjawa' has 11 common interests: culture+jamming, cypherpunk, internet+radio, john+varley, jwz, killing+the+riaa, schadenfreude, security, shriekback, vernor+vinge, waxtrax User 'dnalounge' has 10 common interests: cypherpunk, dna+lounge, internet+radio, jwz, killing+the+riaa, monkey+butter, psytrance, screen+savers, surveilance, webcasting User 'spot' has 9 common interests: buffy+the+vampire+slayer, comics, fight+club, hacking, neal+stephenson, nine+inch+nails, science+fiction, sushi, william+gibson User 'machinegirl' has 9 common interests: blade+runner, cyberpunk, fight+club, front+242, ghost+in+the+shell, lain, psytrance, sushi, william+gibson User 'slithead' has 9 common interests: autechre, cyberpunk, front+242, harlan+ellison, neal+stephenson, schadenfreude, unix, warren+ellis, william+gibson User 'azurecobalt' has 9 common interests: 24, cyberpunk, farscape, lain, neal+stephenson, the+matrix, transmetropolitan, warren+ellis, william+gibson User 'rasp_utin' has 9 common interests: aeon+flux, autechre, blade+runner, cyberpunk, electro, hanzel+und+gretyl, pop+will+eat+itself, waxtrax, william+gibson
This could obviously be cleaned up a bit :)
It would be relatively easy to make the script take input from a CGI, and htmlize the output to make all the names and interests LJ links... but then I think enough people would use it that LJ might take issue with the extra bandwidth consumption. So it's probably better to (a) keep it as a script that stays in the terminal, or (b) implement this the right way (as part of LiveJournal).
September 4 2003, 07:24:17 UTC 8 years ago
September 4 2003, 08:27:40 UTC 8 years ago
September 4 2003, 07:27:46 UTC 8 years ago
September 4 2003, 08:20:40 UTC 8 years ago
September 4 2003, 11:59:42 UTC 8 years ago
Like I said, if I'm wrong and it's a problem for you guys, I'll be glad to stop / take down the script.
Is there a better way I could do this than I am? I'm a fan of having RSS feeds everywhere; RSS of my friends page is something I've wished I had for a while, but RSS output from interests.bml would sure be cool too (and would make this use slightly less bandwidth).
Btw, my next scraper project will probably be a friends-page RSS feed, which I was originally going to do from the HTML but am now thinking about doing by merging each friend's actual RSS feed. I don't know how I'd go about hosting that, as the script needs the reader's LJ credentials to see friends-only posts (the whole point), so I'd probably just give that away as a script too. Something that could be run from cron that would (a) check my userinfo page, (b) check the rss feeds of everybody on my friends list, and (c) merge them together and write out a static RSS file with all the latest entries. Again this is something that could much better be done server-side of course... :)
September 4 2003, 08:29:09 UTC 8 years ago
September 8 2010, 07:16:27 UTC 1 year ago
aerobics