David Trejo's Thoughts

Convert all line endings to LF

Why? So that you can check your node_modules folder into git, whether or not there are mixed line endings in files.

Use dos2unix, which you can get using macports:

sudo port install dos2unix

And then use it:

dos2unix filetoconvert.js

Or if there are many offending files:

find path/to/bad/files/ | xargs dos2unix

And then git add whatever you like and you’ll be good to go!

Filed under  //   crlf   git   lf   line endings   node modules   nodejs  

Scraping Made Easy with jQuery and SelectorGadget

_write_less_do_more

A few days ago I was doing a TON of scraping, and as you know, without the right tools, scraping can be a REAL pain. Out of my pain comes your pleasure — here’s a list of scraping tools and resources which will make your life MUCH easier the next time you need some information from a crufty old website. If you’re short on time, skip to the end and read the tl;dr.

Scraping with jQuery and Node.js

Scraping with jQuery is a real pleasure. Here’s some example code to get you started, it gets the top three articles with their point values from Hacker News. The key part:

var window = jsdom(agent.body).createWindow()
    , $ = require('jquery').create(window);
  
  // scrape!
  var titles = $('.title a')
    , points = $('.subtext span');
    
  var printme = $.map(points, function(el, i) {
      if (i < 3) {
        return $(el).text() + '\t' + $(titles[i]).text();
      }
    });
  
  console.log(printme.join('\n'));

To get the example working, make sure you’ve installed node.js and npm). You may find run.js very helpful for auto-rerunning your scraper whenever you make a change.

Test your code live against the page with jqueryify

Now that you can run any jQuery code you’d like, you can start testing your code against the page. I like to use firebug or Google Chrome’s built in console. Not all pages have jQuery included, and sometimes the version of jQuery will be old, so it’s best to overwrite it with your own version using the jqueryify bookmarklet (it’s also a good idea to change the version of jQuery used by the bookmarklet to match the one used by jsdom to avoid any strange bugs).

Find the shortest selector with Selector Gadget

Next up is finding the correct selectors for the information you want to extract. Selector Gadget makes it very easy to find the least complicated selector that still does the job. Of course you can always choose selectors by hand if you don’t fancy selector gadget, but it is super helpful for crufty nasty sites.

Scrape dynamic pages that use javascript to populate information (arg!)

At this point you can easily find selectors and test on the page live, but there is danger ahead if you encounter a site that uses javascript or AJAX to populate the information. Your tests will work just fine, but when you load the page programmatically, the page’s javascript won’t be run. There’s a few ways to get around this:

  1. Use a regex to extract the values from the javascript written to script tags in the page.
  2. Pretend to be the page and Make requests to the AJAX urls to get the information you need
  3. URL hack. There may still be a URL hack you haven’t tried which will give you the information you need. (in this case, request is your friend.)
  4. Run the page’s javascript programmatically, then scrape the information after it’s been slotted into the page. (See jsdom’s documentation.).

Regex: tools to make your life easier

txt2re.com is a regex generator. That’s right, generator, not tester. This means you enter in a string, click the parts you’d like to match, and then copy and paste the regex into your code. For me, this is pure bliss.

regexpal.com helps you quickly test your regular expressions against test data. Protip: only type the regex, omit the / at the front and end which you’d normally use in your javascript.

tl;dr

  • Use jQuery and node.js for your scraping. Example code
  • Test on the live page with the jqueryify bookmarklet
  • Find the best selectors with Selector Gadget
  • For pages with inline script tags containing information, use txt2re.com to generate regular expressions, and regexpal.com to test them.
  • And finally, don’t forget that URL hacking is your friend!

Thanks for reading — if you have any feedback, please leave a comment below!

PS If things are going wrong, here’s the list of common gotchas:

  • use the same version of jQuery to test as the one you use in your scraper
  • make sure the information you need is not loaded by javascript on the page

Filed under  //   jquery   nodejs  

Watch me demo my phone-controlled game! 7pm tonight! (#nytm #node.js #twilio #socket.io)

Update: You can now see a video demo of touch tone tanks! (It turns out we didn't crash and burn)

Good afternoon!

Later today Paul, Justin, and myself will be demoing our phone-controlled tanks game* in front of 900 people at New York Tech Meetup! (I hope we don't crash and burn)

You can watch us live at livestream.com/nytechmeetup at 7pm tonight.

Cheers,

David

 

* In case you're curious about how this works:

First you set up a projector or other large screen, then you bust out your phones, call our phone number, and press the buttons on your keypad to shoot or move around. The tones are sent to the game over all kinds of crazy internet infrastructure (Twilio, node.js, and socket.io if you're curious. We've even open-sourced the communication bits of it so you can build your own phone controlled game or web application. See nodaphone for documentation and source.)

Filed under  //   nodejs  

Node.js for Server Newbs

Node

This is the story of how a webserver noob and thrifty college student got access to a server on which to run node.js.

This guide is for those who have used node.js locally, but want to run it on a server and show their projects to the world.

This assumes (almost) no familiarity with how hosting works, but does assume some familiarity with node.js and building from source. This guide will help you install node.js on an Ubuntu 10.4, 64-bit machine.

Here we go:

Sign up for web hosting at webbynode. Why did I choose webbynode? It offers hosting for $10/month with root access. Only prgrmr competes, but they have $4/month worth of support, which is scary for those new to hosting, namely myself. I also shied away from webfaction ($9.50/month), because while they do allow you to install node.js in your home directory, there are all kinds of configuration headaches (I think). In any case, these instructions are specific to webbynode with the Ubuntu setup. Update: to get 99 cents off with webbynode, use davidt at checkout.

 

 

Under deploy, click the bare linux image, then choose ubuntu 10.4, 64-bit. This may take a a minute or so. When it is active, continue on.

On your dashboard you should see the IP of your webby, it looks like this: x.x.x.x

Open up a console* on your local machine, and run these at the command line:

localmachine$ ssh root@x.x.x.x

(Enter your password to log into your webby server)

This is key, as it allows you to do everything else:

webby# apt-get update
webby# apt-get install git-core g++ make curl libssl-dev

(python is already installed)

 

Now to download the node files and build them (make sure to use the latest version from http://nodejs.org/#build):

webby# curl -O http://nodejs.org/dist/node-v0.2.0.tar.gz
webby# tar xvzf node-v0.2.0.tar.gz
webby# cd node-v0.2.0
webby# ./configure
webby# make
webby# make install
webby# node --version
>0.2.0

You're ready to write some JS and serve people up!

Cheers!

-David

 

PS In order to start serving your site from yourdomain.com, you'll need to follow webbynode's instructions for changing the dns setting with whoever you bought your domain from. Here's their guide: http://guides.webbynode.com/articles/webbymanager/dns-setup.html Don't forget to migrate your blog and other things!

Bonus:

Set up security on your machine with http://guides.webbynode.com/articles/security/ubuntu810-ufw.html

Use Tim's guide to restart your node process when it crashes [just pay attention to the upstart parts, if you want to keep moving parts to a minimum]

http://howtonode.org/deploying-node-with-spark

 

† that I know of — let me know if you find other good options!

* If you are using a pc you may need to install something like putty so that you can use ssh.

Filed under  //   howto   nodejs  

How to Stay Up-to-date with Node.js

Node

Node.js is changing every day and new releases are very frequent. Here is how I stay up to date with the latest release version of Node.js. 

This guide assumes you already have node.js installed and you are familiar with the command line.

1. Install npm (full npm install guide)

curl http://npmjs.org/install.sh | sh

Or update it if you've already installed it

npm install npm

2. Install nave with npm

npm install nave

3. Use nave to run the version of your choice in a subshell

nave use 0.1.102

4. All done!  Repeat from step 3 whenever a new version of node is released. By the way, npm is great, you can even use it to update itself :)

 

What tools do you use to stay up to date? If there is a better way, please let me know!

Filed under  //   nave   nodejs   npm  

How to Install Node.JS on Windows

Nodejs_logo

Update 11/23/11: Download the official node.exe binary or .msi at http://nodejs.org/#download; install npm using these steps http://npmjs.org/doc/README.html#Installing-on-Windows-Experimental (note that there are plans to include npm in the .msi installer).

UpdateHere are the official instructions: github.com/joyent/node/wiki/Building-node.js-on-Cygwin-(Windows)

 

I wanted to install node on my windows machine and I stumbled across this thread.

All you need to do is download the file below and then unzip it.

http://dl.dropbox.com/u/341900/nodejs/nodejs-0.1.99-cygwin-full.zip 

Then run bash.bat and do something of this sort:

 cd ~/Desktop

 nodejs-0.1.99-cygwin-full/node/bin/node.exe example.js

A big thanks to Simon Sturmer for posting the link!

mirror: http://dl.dropbox.com/u/10047/nodejs-0.1.99-cygwin-full.zip

 

Update 11/23: Cygwin is no longer supported. See the top of the page.

The less easy way (will work even if node 0.1.99 gets old)

Go to http://nodejs.org/#download and download the zip file.

Download cygwin at http://www.cygwin.com/ (it is a dependency of node). The download link is at the top right. This could take a while.

Open cygwin via start > Cygwin > Cygwin Bash Shell

Navigate to the location of the zip file, using "ls" and "cd" commands. Mine is on my desktop.

When the zip file is in the current directory, run 

tar zxvf node<tab>

(Press tab after you type node to trigger auto-complete so there are no typos)

Congrats! Now you have an unzipped version of node on your desktop. Now to set up a version of python 2.4 or greater.

Download the latest version of python at http://www.python.org/download/ (it is a dependency of node)

When that finishes, go back to the Cygwin window you had open earlier. cd into the node directory you just unzipped.

Run

./configure
make
make install

You're done! Super congrats since you did it the hard way!

If you feel like running the tests, run 

make test

Hopefully I made the process less painful; please let me know if this could be improved in the comments.

Enjoy!

Filed under  //   howto   nodejs   windows  
Posted July 1, 2010