From Doug Cutting’s twitter:
“RT @StefanGroschupf: Happy 10th birthday #Nutch! Registered at sourceforce august 2002. Turned out to be quite a game changer.”
The real game changer. And the most known distributed open-source framework too. Former part of Nutch, map-reduce framework Hadoop even a bigger game changer in the world of big data.
Now Nutch is available in two branches. While 1.x (1.5.1 right now) is more stable and has more plugins, 2.x (bit raw yet) brings Apache Gora interface gives you ability to index data not only to SOLR but to different SQL/NOSQL data storages.
I will write more about Nutch 1.x soon as I spent a lot of time during last year to implement specific crawling. So I know what Nutch user needed in many typical cases.