Mapping and processing data in Clojure

I have to admit that I have a thing for DSLs. You can see it at music-as-data were notes and rhythm/beat is "mapped" to data and you can apply data transformations.

The same thing I want to do with data at-rest.

Here is a scenario: I have lots of data sitting as CSV on my hard-drive and I want to process them. Not query them. Process them.

What would be really interesting is to be able to define (dynamically) a schema like that:

(defschema "EURUSD" 
    (tokenizer #(.split % ":"))  
    ;; the mapping is done here
    (columns |time| |open| |high| |low| |close| |volume|))

Let me explain. First of all, a "tokenize" function. Each dataline is tokenized based on a function. Do you want regex? Something more complex? You are free to write anything you like. I really hate frameworks that you must write a complex regular expression or use a compicated system just to tokenize a line.

As you can imagine, tokenize returns a list of data that are mapped to "columns".

Now, the interesting stuff.

You can write scripts like the following:

(if (> |close| 1.45)
    (place-order :buy)
    (place-order :sell))


ping me here -> JR


I love Forex because:

  1. It has enormous amount of data (volume)
  2. These data are coming extremely fast (velocity)
  3. You need to consider multiple resources when you are building your strategy (variety)

My definition of BigData is that you have volume-velocity-variety information and you need to react on it right now (realtime). It is one of the main reasons why I don't like Hadoop (ok, the other is because I don't like Java:).

Forex is the best place if you want to start playing with BigData. You have (at least) one data channel, hitting you with data, you need to keep running algorithms on this stream (sometimes doing correlations up to a week) and you need be able to respond very fast. If a garbage collector kicks in or if you need to grab data from a database (even if this DB is in memory - long live Redis) then you will have issues.

That's the reason why most of the "trading" databases have all their data in the same memory space and have custom languages doing the analysis (like Kdb).

That was the inspiration for LDB.

Millions of data sources (mobile phones), hitting your database and calculating/updating for each one of the requests thousands of counters and running all sorts of algorithms. Per request. In realtime.

But let's face it. The vast majority of users/companies will never have millions (or even thousands) of requests hitting their servers. That's why I started a new opensource database with codename: HybrisDB.

HDB has the following characteristics:

  1. Simple to install (no moving parts)
  2. Simple to use (pre-defined dashboards)
  3. It will be perfect for the 99% of users/companies but not for the 1% like Facebook or Google (sacrificing Enterprise features)

The concept is to have a dashboard, to watch indicators going on/off and then (maybe) connect to a system to place an order.

Sounds like an interesting cool hobby project and I still try to decide between using Erlang or Clojure for this.

Ping me on twitter if you have any ideas!