Alexander Chepurnoy

The Web of Mind

Scala Clients for BTC-e Trade and Public Data APIs (My First Opensource Released)

| Comments

I just released my first open-source component, Scala Client for BTC-e Trade and Public Data APIs! BTC-e.com is broker for Bitcoin/Litecoin/Namecoin/other cryprocurrencies trading. This post is about some choices made during development and how to use the clients.

Usage Details and Examples

  • Implement ClientCredentials trait to connect to the Trade API :

      object MyCredentials extends ClientCredentials {
          val Key = "my key"
          val Secret = "my secret"
      }
    
  • Initialize Trade API client as

      val tradeClient = new DefaultTradeApiClient(MyCredentials)
    
  • Get free funds info with

      client.getInfo.map(println(_)) 
    
    It will print something like
      USD: 4.7 RUR: 2399 EUR: 0.0 BTC: 10 LTC: 19.99 NMC: 0.0 NVC: 0.0 TRC: 0.0 PPC: 0.0
    
  • Cancel all open orders with following code

      tradeClient.orderList.getOrElse(List()).foreach{order=>
          tradeClient.cancelOrder(order.orderId)
      } 
    
  • Create order to sell 200 litecoins for $4.99 each

      tradeClient.trade(Currency.LTC, Currency.USD, Direction.Sell, 4.99, 200.00)
    
  • Close connections at the end

      tradeClient.releaseConnections 
    
  • Initialize Public Data API as

      val pubClient = new DefaultMarketDataApiClient
    
  • Get last deal price from ticker data for BTC/USD and print it:

      pubClient.ticker(Currency.BTC, Currency.USD).map{td=>
          println(td.last)
      }
    

See MarketDataApiClient and TradeApiClient classes for more functions.

Implementation and Customization Details

  1. Common functions located in btce.scala, Trade API client in btce-trade.scala, Public Data API client in btce-marketdata.scala, Specs2 tests in BtceSpec.scala

  2. There are many HTTP layer implementations. I implemented http requests/responses using WS framework from PlayFramework 2(Scala wrapper for Ning framework). If your project doesn’t use PlayFramework and/or already uses another HTTP framework(e.g. Apache HttpClient), make own implementation of HttpApiClient trait. Override functions getRequest(url: String): String (simple get request, it’s used by Public Data API), signedPostRequest(url: String, key: String, Secret: String, postBody: String): String (post request with already signed postBody, used by Trade API ), releaseConnections (shutdown connections pool here, if needed). Then define own Trade API client with code like class MyTradeApiClient(credentials: ClientCredentials) extends TradeApiClient(credentials) with MyHttpApiClient, class MyMarketDataApiClient extends MarketDataApiClient with MyHttpApiClient for Public Data client.

  3. Enumerations chosen over sealed case classes hierarchy, e.g.

     object Direction extends Enumeration {
         type Direction = Value
         val Sell = Value("sell")
         val Buy = Value("buy")
     }
    

    It could be not the best choice in case of having in mind to build trading DSL over it. But I have no plans for trading DSL now.

  4. No logging implemented in the released version to avoid extra dependency. If you incorporate a client into your software, add logging where needed(catch clauses, None results) with a logging framework project uses.

Again, the URL is https://github.com/kushti/btce-scala

Why Scala+PlayFramework Could Be the Best Choice for Your Startup

| Comments

Do you plan to change this world with a web startup? Thinking about technology stack? Monsterous Spring+hundreds of other Java frameworks or elegant, trendy but bit controversial Ruby on Rails? Don’t think about any compromises, think about Scala+PlayFramework 2!

What gives you Scala and PlayFramework combination?

  • Play’s CLI(command line interface), hit refresh workflow, conciseness of Scala code and powerful abstractions provided by the framework(and dependent frameworks too, e.g. Specs2) give you stunning speed of development. In fact, you can have development speed typical for dynamic language while having all benefits of strong static typing. A startup needs for fast prototyping, so get it!

  • A startup should be scalable to handle fast growth of userbase. Stateless framework architecture & built-in Akka support give you highest level of scalability.

  • Scala is the JVM language means you can easily use thousands of opensource frameworks for map-reduce data processing, NLP, ML, genetic algorithms etc… Java was(and is) standard for academic open-source frameworks, #1 language for Apache Software Foundation(more than 100 opensource projects) etc. It adds speed to the prototyping, make your system more simple(one platform means less headache), also makes you development process much cheaper.

  • Type safety gives you more stable and predictable development process(easier refactorings, avoiding of some types of errors etc). Unit tests are not enough, there is no doubt.

  • Built-in asynchronous HTTP support makes modern web applications development easy.

P.S. I’m passionate about PlayFramework 2.x(already used it for 3 Scala and 1 Java projects). Next month I’m thinking about navigation plugin development(a bit like play navigator, but with formal FSM approach). Please write me if you want to contribute.

Play2+Morphia: How to Avoid ‘Can’t Parse Argument Number Interface’ Error

| Comments

Trying to run Play2 + Morphia application, you can get such an error java.lang.IllegalArgumentException: can't parse argument number interface com.google.code.morphia.annotations.Id = @com.google.code.morphia.annotations.Id().

How to avoid it:

  • Add SLF4JExtension for Morphia : http://code.google.com/p/morphia/wiki/SLF4JExtension. Here is the example how to add SBT dependencies, but please mind difference beetwen com.google.code.morphia and com.github.jmkgreen.morphia (make appropriate changes): https://github.com/leodagdag/play2-morphia-plugin/blob/master/project/Build.scala .

  • Add 2 lines to init of your Global object(or beforeStart method)

      import com.google.code.morphia.logging.MorphiaLoggerFactory
      import com.google.code.morphia.logging.slf4j.SLF4JLogrImplFactory
      import play.api.GlobalSettings
    
      object Global extends GlobalSettings{
          MorphiaLoggerFactory.reset()
          MorphiaLoggerFactory.registerLogger(classOf[SLF4JLogrImplFactory])
      }
    

Akka-based Data Extraction System Design

| Comments

Introduction

If you have an experience in data extraction systems, you know how hard it could be to develop. You need to implement workers then combine them in error-prone, scalable and flexible system. Sounds like a lot of pain, isn’t it? But with modern painkillers the job could be done much simpler. I mean Akka.

I already used Akka for some real-world systems, including realtime forex data mashup forexnotions.com, domains value estimation system, data gathering systems for SEO parameters, real estate etc. And I want to publish common approach I use in the simplest form.

The Example

Consider real estate data extraction system, where some sources have XML/RSS output, some only HTML. Workers already written, one for each site, so it’s the time to combine them into higher-level logic. Consider, for example, we have 3 sites to get data from, and we want to recrawl them every 30 seconds(too crazy, but it’s just an example).

A worker is derived from base trait BasePropertyExtractor and returns list of properties or exception. Let’s define sample workers as well as sample property bean

case class Property(name:String)
case class ExtractionResult(value: Either[Throwable, List[Property]])

trait BasePropertyExtractor {
    def extractData:ExtractionResult
    def label:String
}

class SiteAExtractor extends BasePropertyExtractor{
    override def extractData = ExtractionResult(Right(List[Property](Property("Nice beachside boongalow"))))
    override def label = "SiteAExtractor"
}

class SiteBExtractor extends BasePropertyExtractor{
    override def extractData = ExtractionResult(Right(List[Property](Property("Awesome apartments"))))
    override def label = "SiteBExtractor"
}

class SiteCExtractor extends BasePropertyExtractor{
    override def extractData = ExtractionResult(Left(new Exception("XML parsing failed")))
    override def label = "SiteCExtractor"
}

Define control signals to be sent to system’s components

case class ExtractionCommand(extractor:BasePropertyExtractor)
object StartParsing

StartExtraction is signal to start whole extraction process, while ExtractionCommand is signal to start concrete extractor

Data extraction actor:

class ExtractingActor extends Actor with akka.actor.ActorLogging {
    override def receive = {
        case ExtractionCommand(extractor:BasePropertyExtractor) =>
        println("Going to extract data by "+extractor)
        sender ! extractor.extractData
    }
}

Database writer actor(in our example it doesn’t write to a database actually, but just prints result to console:

class DbWriterActor extends Actor with akka.actor.ActorLogging {
    override def receive = {
        case p: Property  => println(p)
    }
}

And we’re going to define main actor incapsulating control logic and implementing ScatterGather design pattern(yeah, meet design patterns in actors field):

class ScatterGather extends Actor with akka.actor.ActorLogging {
    context.setReceiveTimeout(29 seconds)

    private val actorsCommands = Map(
        context.actorOf(Props[ExtractingActor]) -> ExtractionCommand(new SiteAExtractor),
        context.actorOf(Props[ExtractingActor]) -> ExtractionCommand(new SiteBExtractor),
        context.actorOf(Props[ExtractingActor]) -> ExtractionCommand(new SiteCExtractor)
    )

    private val dbWriterActor = context.actorOf(Props[DbWriterActor])

    override def receive = {
        case StartExtraction =>
            actorsCommands foreach {
                case (actor, command) => actor ! command
            }

        case result: ExtractionResult => result.value match{
            case Left(t:Throwable) => log.warning("Exception found instead of result: " + t)
            case Right(l:List[Property]) => l foreach {writer ! _}
        }

        case ReceiveTimeout =>
            context.stop(self)
            actorsCommands foreach {_ => context.stop(_)}
            context.stop(dbWriterActor)
    }
}

And launcher

object RealEstateExtractionLauncher extends App {
    import ExecutionContext.Implicits.global
    val system = ActorSystem("RealEstateExample")

    system.scheduler.schedule(0 seconds, 30 seconds){
        val listeningActor = system.actorOf(Props[ScatterGather])
        listeningActor ! StartParsing
    }
}

Complete Code & Output

Complete code is on Github : https://github.com/kushti/blog-examples/blob/master/scala/akka/DataExtractionSystem.scala

Running it you’ll get something like

Going to extract data by SiteCExtractor
Going to extract data by SiteBExtractor
Going to extract data by SiteAExtractor
Property(Awesome apartments)
Property(Nice beachside boongalow)
[WARN] [02/27/2013 13:22:40.331] [RealEstateExample-akka.actor.default-dispatcher-2] [akka://RealEstateExample/user/$a] Exception found instead of result: java.lang.Exception: XML parsing failed
Going to extract data by SiteAExtractor
Going to extract data by SiteCExtractor
Going to extract data by SiteBExtractor
Property(Nice beachside boongalow)
Property(Awesome apartments)
[WARN] [02/27/2013 13:23:10.297] [RealEstateExample-akka.actor.default-dispatcher-7] [akka://RealEstateExample/user/$b] Exception found instead of result: java.lang.Exception: XML parsing failed

Conclusion

In less than 100 lines of code we got fully scalable recurrent data extraction example with simple logging and error handling. And as Akka is a close friend of Play 2.x framework, to say more preciously, Play includes Akka, it’s easy to build Web Application on top of our system.

You can now start to play with remote actors to build distibuted system. Or implement real-world application starting with the example design provided. Or visit “Hire me” section.

Play Framework (v. 2.1): How to Test Actions Processing Binary Streams and Set Content Type for FakeRequest

| Comments

I spent few hours on it, and want to help to save few hours of another Scala/Play developer, maybe You :) Developed and tested with Play 2.1, Scala solution given(but Java alternative wouldn’t be far away from it, I guess).

Consider having action like this one:

def filesEndpointPost = Action(parse.temporaryFile) {
    request =>
      request.headers.get(CONTENT_TYPE) match {
        case Some(cType) =>
          val fileExtension = cType.substring(cType.lastIndexOf("/") + 1, cType.length)
          val filename = RandomStringUtils.randomAlphanumeric(5) + "." + fileExtension
          val visibleFilename = filesFolder + "/" + filename
          request.body.moveTo(file(tmpFolder,filename), true)
          Created.withHeaders(LOCATION -> visibleFilename)
        case None =>
          BadRequest("No content type given")
      }
}

You want to test it. Test should:

  • Send binary stream to the action
  • Set Content-Type header. By default it’s rewritten automatically by the framework, so custom play.api.http.Writeable implementation needed to be given to route() function

With such requirements, a solution is:

"Send binary stream with POST to /files" in new WithApplication {
  val filesPostRoute = route(FakeRequest(POST,
    controllers.routes.FilesController.filesEndpointPost().url,
    FakeHeaders(Seq(CONTENT_TYPE -> Seq("application/pdf"))),
    "brokenpdf"))(new Writeable({s: String => s.getBytes}, None))

  val result = filesPostRoute.get
  status(result) mustEqual CREATED
  header(LOCATION, result).getOrElse("") must contain("files/")
  header(LOCATION, result).getOrElse("") must contain(".pdf")
}

Valid Scala vs. Valid HTML : Play Views vs. Lift Views

| Comments

Play2 and Lift are major web frameworks in the Scala industry. And they are different. Let me describe view layer (V in the MVC) design principles of these frameworks. Note: the article covers only default and most popular approaches in both frameworks, but there many possibilities to go other way.

Play2 views are rendered with a Scala-based template engine that provides type and compiled checked templates. So a Play2 view is valid Scala code that could be checked during compilation:

@(news: List[NewsElement], events:List[FxEvent])
@import models._

@tilesOut(content: List[models.ContentElement]) = {
    <div id="home-tiles" class="container-fluid metro">
    @for((contentElem, index) <- content.zipWithIndex) {
        @if(index%columnsInRow==0){
            <div class="row-fluid">
        }
        <div class="span3">
            @shortInfo(contentElem)
        </div>
        @if(index%columnsInRow==columnsInRow-1 || index==content.size-1){
            </div>
        }
    }
    </div>
}


@wrapper("Title",
    "Description",
    "keys"){

    <h1>News</h1>
    @tilesOut(news)

    <h1>Events</h1>
    @tilesOut(events)
}

(real example from a metro-style site)

On other hand, a Lift view developed with “view-first” approach is valid HTML code like this:

...
<div lift="comet?type=Clock">
    <span id="time"></span>
</div> 
... 

(taken from Lift tutorial)

With controller code like:

object Tick
class Clock extends CometActor {
    Schedule.schedule(this, Tick, 10 seconds)

    def render = "#time *" #> now.toString

    override def lowPriority = {
        case Tick => 
            val js = SetHtml("time", Text(now.toString))
            partialUpdate(js) 
            Schedule.schedule(this, Tick, 10 seconds) 
    }
}

Lift has MVC support also, but “view-first” is the main principle supported by the Lift community.

Valid Scala vs valid HTML? For me, Play2 way is more natural, Lift could be better only in some cases(however, it’s more friendly for a HTML guy in your team)

P.S. Nice Stackoverflow conversation Should I use Play or Lift for doing web development in Scala?

Nutch 1.6 Is Coming

| Comments

Nice news from Nutch’s developer mailing. Nutch 1.6 RC will be available in next few day. More than 40 bugs/feature requests closed!

Now Nutch is growing up in 2 branches concurrently: 1.x and 2.x. Now 1.x seems to be more stable, and more plugins implemented, but 2.x branch has implemented Apache Gora so it’s possible to write crawled data to a bunch of SQL/NoSQL datastores, not just to SOLR(as with 1.x). Latest 2.x version, 2.1 was released on October, 5th.

Open Information Land Tightens, Digg Has No RSS Now.

| Comments

Regarding Alexa most of Digg.com users found relaunched site worse than ever. However, it was predictable. But amongst other stupid things made by NY hipsters crew there is one trendy and totally ugly. I mean lack of standard information sharing support in form of RSS/ATOM.

First, Facebook and Twitter removed RSS links. That’s the scary trend in the era of ecosystem war. But now I can’t find RSS link even on Digg.com news site! Instead, I’m offered to download iPhone/iOs application. But I haven’t any Apple gadget and don’t want to install any additional application. I want to add news sources to my RSS reader and get all buzz around in one place.

Future of the Web could be only in open information, meaningful mashups and data syndication. Today’s trend to close information gives benefits only to big companies owning large social networks. But not for user willing only to read qualiaty news collected from tens or hundreds sources or get aggregated analysis of open data.

How to Make Blekko Your Default Search Engine in Firefox

| Comments

Kinda weekend post, no programming now.

As you may know, Blekko is spam-free community-driven search engine with hashtags. But what’s about to change outdated google with this amazing search engine?

It’s simple as that:

  1. Type about:config in the Firefox address bar then press ENTER.

  2. Locate and double-click the entry for keyword.URL

  3. New value should be https://blekko.com/ws/

Enjoy! P.S. Blekko and DuckDuckGo are my best search providers!

How to Install Hadoop 1.0.3 on Cluster (+Nutch)

| Comments

There are many tutorials over the Web on how to install Hadoop (and Nutch) in cluster environment. But for me, not one was 100% enough. So I’m going to give you another tutorial. Hope it could helps.

The tutorial is tested for CentOS-powered cluster.

The tutorial assumes a cluster has only one Master node. However, in real medium and big-sized application there are more than one master machines

Installation

  • Check whether all needed packages are installed: SUN java, sshd
  • Give all machines hostnames(for example devcluster.master0, devcluster.slave0, devcluster.slave1 etc). Set hostname to /etc/hostname file with sudo nano /etc/hostname (then reboot or relogin or restart network daemon)
  • Append hostnames resolution to /etc/hosts file on every(!) machine. Delete other hostnames associated with a machine IP from /etc/hosts (for example, CentOS xxx.xxx.xxx.xxx for CentOS)
  • If firewall is installed , open ports 54310, 54311, 50030, 50060, 50070
  • Test whether all is ok with ports and hostnames. From a machine try to ping an another machine: ping devcluster.slave1
  • Create Hadoop user (on every machine, of course): useradd hadoop (don’t set password)
  • su hadoop (further commands will be for “hadoop” user if other not specified)
  • Install certificate to make login possible without password: ssh-keygen (on one machine, master, for example. Enter no passphrase!)
  • Now spread generated certificate over a cluster: ssh-copy-id hadoop@devcluster.slave0 (etc… to all machines)
  • Check password-less certificate-based SSH login works correctly ssh localhost then ssh devcluster.slave0 and so on for all machines
  • Add to .bashrc

      HOME_HADOOP=/etc/hadoop
      PATH=$PATH:$JAVA_HOME/bin:$HOME_HADOOP/bin
    
  • Download Hadoop 1.0.3, make changes to configuration files according to official documentation . If you are installing Nutch, download it, build and copy .job file (and bin/nutch) to,say, nutch folder under a Hadoop folder. Then pack Hadoop folder again and move it to the place visible from all machines

  • Unpack package to /etc/hadoop (for all machines)
      sudo mkdir /etc/hadoop
      sudo chown hadoop:hadoop /etc/hadoop
      cd /etc/hadoop
      wget http://[repository-url]/hadoop.tar.gz
      tar zxvf hadoop.tar.gz
      sudo chown -R hadoop:hadoop *
    
  • Ensure there are no any problems by simply launching bin/hadoop . If some problems occurred(probably, JAVA_HOME is not set), now it’s time to fix.
  • Format HDFS (run this command on master node only) bin/hadoop namenode -format

Installation done!

Usage

  • Start Hadoop cluster: bin/start-all.sh (check whether it’s started ok).
  • Run WordCount example v. 2.0 from official tutorial
  • If you installed Nutch, try to run a Nutch task e.g. crawl with seed URLs given in urllist.txt file:
      mkdir urls
      cp [pathtourllist file]/urllist.txt urls/
      bin/hadoop dfs -put urls urls
      bin/hadoop dfs -cat urls/urllist.txt
      nutch/bin/nutch crawl urls -dir crawled -depth 2
      bin/hadoop dfs -copyToLocal crawled ../results
    
  • Stop cluster: bin/stop-all.sh

Monitoring

  • Check /etc/hadoop/logs folder for keywords “error” and “exception”
  • Check JobTracker status by opening http://[masterNode]:50030/jobtracker.jsp in browser
  • Check HDFS status by opening http://[masterNode]:50070/dfshealth.jsp in browser
  • Check a slave node’s task tracker status with http://[slaveNode]:50060/tasktracker.jsp

Troubleshooting

In case of eroors see logs for exceptions then google them. Or click ‘hire me’ for consultation.