Tuesday, October 7, 2014
Some source code for the previous post
https://www.dropbox.com/s/blv7zci9p37gvy4/eurusd4.tar.gz?dl=0
Fitting Pybrain's RNN prediction
The following picture illustrates this problem. The blue signal is the RNN target set. The green signal is the raw RNN output for the training set input. The red signal is a crude attempt at fitting the green signal to the blue signal. It's merely a vertical displacement, together with an increase in amplitude, of the raw green signal.
Wednesday, September 24, 2014
First attempt at implementing recurrent neural network FOREX trend prediction
My first attempts at implementing the experiment from the russian paper with neurolab failed miserably. It just doesn't have all the options I needed. I've decided to switch to PyBrain (https://github.com/pybrain/pybrain/wiki/installation). Note that I've only worked with the EUR/USD trading rate.
Although the results were better, they were not satisfactory. The following plot shows how close the Elman RNN got to the normalized training set. After around 1500 epochs, PyBrain couldn't get any closer. Also, the quality seems to depend heavily on the initial random values of the net.
Sunday, August 31, 2014
Considering an ARIMA model
I came across a blog post and a paper describing methods for implementing an ARIMA model as a price forecaster. They both deal with the stock market, which is not exactly the same as FOREX. But given the lack of ARIMA examples involving FOREX, I've decided to take a look at these:
http://programming-r-pro-bro.blogspot.mx/2013/04/forecasting-stock-returns-using-arima.html
http://www.hindawi.com/journals/jam/2014/614342/
From the blog post it seems that ARIMA doesn't have a very good resolution in it's predictions. The most it seems to be able to predict is a likely dynamic range for the future return rate. The paper shows how an ANN predicts fluctuations more closely. Still: this paper uses a FFNN, whereas the russian paper from one of my earlier posts uses a SRN.
ARIMA seems more limited than the SRN in it's predictive power. Therefor, I have decided to give priority to the SRN approcah using the newelm function from neurolab which I mentioned in an earlier post.
The ARIMA implementation, nevertheless, has inspired me to attempt a new rustic buy-and-hold algorithm. One which takes into account the density of price falls through time, and a likely dynamic range for it. I will be reporting on the results later on.
Kalman filter disappointment
http://greg.czerniak.info/guides/kalman1/
http://bilgin.esme.org/BitsBytes/KalmanFilterforDummies.aspx
From the formulae for Kk and Pk, it seems that all this Kalman filter does is to damp the signal as Kk simply decreases with each iteration. Which is not very impressive.
I wasn't able to extract any "predictive" component from this implementation of the Kalman filter. I've decided to desist with this approach.
Thursday, August 28, 2014
Using a Kalman filter to predict short-term currency price fluctuation
http://www.gaussianprocess.org/gpml/chapters/RW.pdf
But it involves concepts which are too advanced and complicated for me at this moment. Besides: gaussian processes are not the same as gaussian noise. It seems that a gaussian process is formally defined as a process involving "multivariate normal distribution". Since the price fluctuations I'm studying have only one variable, I believe using gaussian process prediction methods would be overkill. I might be wrong, though.
Another method is one used to filter out gaussian noise. It's called "Kalman filtering". Kalman filtering seems useful, because it works by predicting gaussian noise in order to eliminate it. Because currency price fluctuations follow a gaussian distribution, I think the predictive component of the Kalman filter may be useful in predicting short-term future fluctuations in currency price.
The following seems like a good, comprehensive, introduction to Kalman filtering:
http://www.cs.unc.edu/~tracker/media/pdf/SIGGRAPH2001_CoursePack_08.pdf
So the basic idea is to treat FOREX price fluctuation as if it were gaussian noise, and try to predict it short-term with a Kalman filter.
I want to be able to predict short-term future price fluctuations because I discovered that sudden and large price decrements produce important losses when using my rustic buy-and-hold algorithm. These predictions may turn out to be useful in setting up an effective predictive stop-loss alarm for my rustic buy-and-hold algorithm.
Sunday, August 24, 2014
Found normal distributions in price changes and price decrement frequencies
First attempt at a rustic buy-and-hold forex bot
Here's the source code. It's a little rough, fyi:
/*
Rustic buy-and-hold
Requires fine tuning. Perhaps make it adaptive.
Vicente Oscar Mier Vela
<vomv1988@gmail.com>
Example
$ cat samples_1407099600_1407531600_M1.dat | ./t6 1407099600 1407531600 60 200 20 20
Example of samples_START_END_TB.dat
$ cat samples_1407099600_1407531600_M1.dat | head
1407099600
1.34247
1.34359
1407099660
1.34285
1.34326
1407099720
1.34291
1.34331
1407099780
This one trys to link density of losses with absolute losses using
local average minimum / maximum algorithm.
*/
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#define closeBid 0
#define closeAsk 1
int cmp(const void *x, const void *y);
int main(int argc, char *argv[]){
int startdate = atoi(argv[1]);
int enddate = atoi(argv[2]);
int timebase = atoi(argv[3]);
int nsamples = (enddate - startdate) / timebase;
int expdate;
int currdate;
double *samples[2];
samples[closeBid] = malloc(sizeof(double)*nsamples);
samples[closeAsk] = malloc(sizeof(double)*nsamples);
int i;
for(i = 0, expdate = startdate; i < nsamples; i ++, expdate += timebase){
scanf("%d", &currdate);
/*
The following assumes that the first date from the
dataset will always yield a value. That is: 1st date is
always "ticked".
*/
if(currdate == expdate){
scanf("%lf", samples[closeBid] + i);
scanf("%lf", samples[closeAsk] + i);
} else {
while(currdate != expdate && i < nsamples){
samples[closeBid][i] = samples[closeBid][i-1];
samples[closeAsk][i] = samples[closeAsk][i-1];
expdate += timebase;
i++;
}
if(i < nsamples){
scanf("%lf", samples[closeBid] + i);
scanf("%lf", samples[closeAsk] + i);
}
}
}
int *diffs = (int *) malloc(sizeof(int) * nsamples);
diffs[0] = 0;
for(i = 1; i < nsamples; i ++)
if(samples[closeBid][i] - samples[closeBid][i - 1] < 0)
diffs[i] = -1;
else if(samples[closeBid][i] - samples[closeBid][i - 1] > 0)
diffs[i] = 1;
else
diffs[i] = 0;
int *diffs2 = (int *) malloc(sizeof(int) * nsamples);
int k;
int range = 10;
for(i = 0; i < range; i ++)
diffs2[i] = 0;
for(i = nsamples; i > nsamples - range; i --)
diffs2[i] = 0;
for(i = range; i < nsamples - range; i ++)
for(k = range * -1 ; k < 0 ; k ++){
if(diffs[i + k] == -1)
diffs2[i] ++;
}
for(i = 0; i < nsamples; i ++)
printf("%d >> %d\n", diffs2[i], diffs[i]);
int j, t = atoi(argv[4]), mxs = atoi(argv[5]), mns = atoi(argv[6]);
double *s = (double *) malloc(sizeof(double) * t);
double *q;
double sum, loc_avg_min = 0, loc_avg_max = 0, bal = 0;
int flag = 0;
for(i = 0; i + t < nsamples; i ++){
q = samples[closeBid] + i;
for(j=0;j<t;j++)
s[j] = q[j];
qsort(s, t, sizeof(double), cmp);
for(j = 0, sum = 0; j < mns; sum += s[j], j ++);
loc_avg_min = sum / (double) mns;
for(j = t - 1, sum = 0; j >= t - mxs; sum += s[j], j --);
loc_avg_max = sum / (double) mxs;
if(flag == 0 && samples[closeBid][i + t - 1] <= loc_avg_min){
bal -= samples[closeAsk][i + t - 1];
flag = 1;
}
if(flag == 1 && samples[closeBid][i + t - 1] >= loc_avg_max){
bal += samples[closeBid][i + t - 1];
flag = 0;
printf("%f %d\n", bal, diffs2[i + t - 1]);
}
}
free(samples[closeBid]);
free(samples[closeAsk]);
free(s);
free(diffs);
free(diffs2);
return 0;
}
int cmp(const void *x, const void *y){
double xx = *(double*)x, yy = *(double*)y;
if (xx < yy) return -1;
if (xx > yy) return 1;
return 0;
}
Wednesday, August 20, 2014
Idea for a very rustic buy-and-hold algorithm
In these last few days I've been imagining a new (?) algorithm for doing simple buy-and-hold trades. The key is in finding the most probable highest and lowest currency prices for a given point in time. It would involve something like taking a list of the last N prices, sorting it, and averaging the top M values to get some "local average maximum" and the last L to get a "local average minimum". Then, placing limit orders with those values. Come to think of it, it sounds a little like RSI's overbought and oversold indicators. But it's not quite it.
Pretty crude. But I'll do it while I work on the NN approach, and see what happens.
Second thoughts about tuning the MACD with a GA
http://forums.randi.org/showthread.php?t=96372
That said, I've come across a paper which says neural networks do a good job predicting forex market trends (http://arxiv.org/pdf/cond-mat/0304469.pdf). It uses a neural network architecture which is a mix between an Elman and a Joran SRN. I believe it doesn't say what training algorithm they used to teach the network to predict market trends. In any case, I will probably be using RTRL, because it seems less resource-consuming than BPTT. I doubt my crappy computer can handle BPTT for the large amounts of data I plan to feed my SRN. Also, I would like to start with a pure Elman architecture, instead of the "Elman-Jordan" architecture suggested in the paper. I just don't have the expertise in NN's to copy the paper step by step.
I should mention I've never implemented a FFN, much less a SRN. After spending several days looking into the details of how FFN's and SRN's work, I've come up with a new TODO list:
- Understand how FFN's work (check)
- Understand how SRN's work (check)
- Understand the backpropagation algorithm for FFN's
- Understand the BPTT algorithm for SRN's
- Understand the RTRL algorithm for SRN's
So who knows... I guess I'll wait and see what my own experiments tell me.
Sunday, August 17, 2014
MACD fitness function for the GA
In the following source code, macdbal() is this fitness function. The program works by filling in all the missing candles from the data downloaded by the dl2.sh script from an earlier post. It then calculates the Simple Moving Averages used to obtain the MACD. Based on the MACD, macdbal() "buys" (subtracts the closing ask price from the balance) and "sells" (adds the closing bid price to the balance).
/*
MACD fitness function
Vicente Oscar Mier Vela
<vomv1988@gmail.com>
Use the output of "Oanda 5000 candle limit bypasser" script
as input for this program.
Example:
$ ./dl2.sh "Aug 3 21:00:00 GMT 2014" "Aug 8 21:00:00 GMT 2014" M1 > 5KOUT
$ cat 5KOUT | ./gen2 1407099600 1407531600 60 17 91 99
The output from above should be:
Success rate 29.411765%
Final balance: 0.001420
The first two arguments of gen2.c are the dates used for dl2.sh in UNIX time format.
For example, use:
$ date -d "Aug 3 21:00:00 GMT 2014" +%s
to obtain
1407099600
*/
#include <stdio.h>
#include <stdlib.h>
#define closeBid 0
#define closeAsk 1
double macdbal(int emalow, int emahigh, int emasignal, double *closeask, double *closebid, int length);
int main(int argc, char *argv[]){
int startdate = atoi(argv[1]);
int enddate = atoi(argv[2]);
int timebase = atoi(argv[3]);
int emalow = atoi(argv[4]);
int emahigh = atoi(argv[5]);
int emasignal = atoi(argv[6]);
int nsamples = (enddate - startdate) / timebase;
int expdate;
int currdate;
double *samples[2];
samples[closeBid] = malloc(sizeof(double)*nsamples);
samples[closeAsk] = malloc(sizeof(double)*nsamples);
int i;
for(i = 0, expdate = startdate; i < nsamples; i ++, expdate += timebase){
scanf("%d", &currdate);
/*
The following assumes that the first date from the
dataset will always yield a value. That is: 1st date is
always "ticked".
*/
if(currdate == expdate){
scanf("%lf", samples[closeBid] + i);
scanf("%lf", samples[closeAsk] + i);
} else {
while(currdate != expdate && i < nsamples){
samples[closeBid][i] = samples[closeBid][i-1];
samples[closeAsk][i] = samples[closeAsk][i-1];
expdate += timebase;
i++;
}
if(i < nsamples){
scanf("%lf", samples[closeBid] + i);
scanf("%lf", samples[closeAsk] + i);
}
}
}
double x = macdbal(
emalow, emahigh, emasignal,
samples[closeAsk], samples[closeBid],
nsamples
);
printf("Final balance: %f\n",x);
free(samples[closeBid]);
free(samples[closeAsk]);
return 0;
}
double macdbal(int emalow, int emahigh, int emasignal, double *closeask, double *closebid, int length){
int i;
double *emalows = malloc(sizeof(double)*length);
double *emahighs = malloc(sizeof(double)*length);
double *emasignals = malloc(sizeof(double)*length);
for(i = 0; i < length; i ++){
scanf("%lf", closebid + i);
emalows[i] = emahighs[i] = emasignals[i] = 0;
}
double sum;
for(i = 0, sum = 0; i < emalow; i ++){
emalows[i] = closebid[0];
sum += closebid[i];
}
emalows[emalow - 1] = sum / ((double) emalow);
double lowmult = 2.0 / (((double) emalow) + 1.0);
emalows[emalow] = (closebid[emalow] - emalows[emalow - 1]) * lowmult + emalows[emalow - 1];
for(i = emalow + 1; i < length; i ++)
emalows[i] = (closebid[i] - emalows[i - 1]) * lowmult + emalows[i - 1];
for(i = 0, sum = 0; i < emahigh; i ++){
emahighs[i] = closebid[0];
sum += closebid[i];
}
emahighs[emahigh - 1] = sum / ((double) emahigh);
double highmult = 2.0 / (((double) emahigh) + 1.0);
emahighs[emahigh] = (closebid[emahigh] - emahighs[emahigh - 1]) * highmult + emahighs[emahigh - 1];
for(i = emahigh + 1; i < length; i ++)
emahighs[i] = (closebid[i] - emahighs[i - 1]) * highmult + emahighs[i - 1];
int signalstart = emahigh * 2;
for(i = 0; i < signalstart + emasignal; i ++)
emasignals[i] = 0;
for(i = signalstart, sum = 0; i < signalstart + emasignal; i ++)
sum += emalows[i] - emahighs[i];
emasignals[signalstart + emasignal - 1] = sum / ((double) emasignal);
double signalmult = 2.0 / (((double) emasignal) + 1.0);
emasignals[signalstart + emasignal] =
(
(emalows[signalstart + emasignal] - emahighs[signalstart + emasignal]) -
emasignals[signalstart + emasignal - 1]
) * signalmult + emasignals[signalstart + emasignal - 1]
;
for(i = signalstart + emasignal + 1; i < length; i ++)
emasignals[i] =
((emalows[i] - emahighs[i]) - emasignals[i - 1]) * signalmult + emasignals[i - 1]
;
double last = 0;
double balance = 0;
double prevbal = 0;
double lastsell = 0;
double total = 0;
double success = 0;
for(i = signalstart + emasignal; i < length; i ++){
if(emasignals[i] > 0 && last < 0){
last = emasignals[i];
prevbal = balance;
balance -= closeask[i];
}
if(emasignals[i] < 0){
if(last == 0){
/* memo to myself: get ready to buy before starting this loop */
last = emasignals[i];
}
if(last > 0){
last = emasignals[i];
prevbal = balance;
balance += closebid[i];
if(lastsell < balance)
success ++;
total ++;
lastsell = balance;
}
}
}
/* remove this for genetic algorithm */
printf("Success rate %f%\n",100*(success/total));
free(emalows);
free(emahighs);
free(emasignals);
/* return balance after last sell */
if(last > 0)
return prevbal;
else
return balance;
}
Wednesday, August 13, 2014
Bypassing OANDA's 5000 candle limit
This limit can be bypassed using the following script:
I will pipe this script's output into a C program to produce the trading simulation I mentioned in an earlier post.
Starting with MACD using shell scripts and C
I admit I have no formal or previous experience with trading. This blog is not intended as a guide of any kind. Much less as trading advice. This blog is about the development of my FOREX learning curve, from the very beginning. The notes contained here are for my personal use, but I will share them in case anybody out there wants to collaborate, make suggestions, corrections, criticism, or simply to learn alongside myself.
That said, I'll start by sharing some of the experiences I've collected so far.
I've started by becoming familiar with the basic FOREX trading concepts. I believe I have a good enough grasp on pips, leverage and the MACD to start playing around with an OANDA test account.
Recenly I've run some experiments using the OANDA HTTP API, UNIX shell scripts and C programming. So far I haven't been able to turn any profit using my MACD schemes with the OANDA API. I blame a set of factors for this:
- The input parameters I've used are based on only a few trading simulations I ran using the EUR_USD price data of 3.5 days (5000 1-minute samples).
- My simulations do not entierly represent real trading. I must improve them.
- I am using MACD exclusively. I should try combining it with other methods, such as a 1-2-3 scheme and RSI.
To solve the issues which have prevented me from turning a profit with my OANDA test account, I have formulated the following TODO list:
- Writing a proper trading simulator.
- Using such simulator to run a genetic algorithm which finds the optimal MACD parameters (genetic MACD tuning). These parameters include the typical 3 numeric values, and a fourth value representing the time frame for sampling closing bid price values.
- Run my MACD shell script with the obtained values.
- Copy this trading strategy: http://forex-strategies-revealed.com/simple/123-rsi-macd using only shell scripting and C programming, as to understand the nuts and bolts of MACD, 1-2-3 and RSI.
- Copy the same trading strategy using the suggested pre-existing software packages.