FOREX | UNIX

New blue skies research blog

2015-04-18T05:40:00.003-07:00

I've decided to expand my scientific horizons. I'm keeping track of my research in another blog:

http://cienciadesatada.blogspot.com

My first post there has been about my experiments with neural networks. Enjoy.

Why market forecasting is still impossible

2015-03-09T00:00:00.002-07:00

These are some references of texts with reasons why we're still far from being able to make accurate predictions of market fluctuations:

Evidence-Based Technical Analysis, by David Aronson:
http://www.evidencebasedta.com/

The (mis) behavior of Markets, by B. Mandelbrot and R. L. Hudson
http://users.math.yale.edu/~bbm3/web_pdfs/misbehaviorprelude.pdf

Debunking Economics, by Steve Keen
http://www.debunkingeconomics.com/

These are some of the best critiques I've seen on the "science" of economics. I've come to the conclusion that economics is a protoscience, more than a science.

My interpretation of Kuperin's paper on price forecasting

2015-02-22T21:33:00.004-08:00

When I was trying to reproduce Kuperin et al's experiments:

http://link.springer.com/chapter/10.1007/3-540-47789-6_127
http://arxiv.org/abs/cond-mat/0304469

I produced the following work:

https://www.dropbox.com/s/doyi1lm7ugc7e7f/elmold.tar.gz?dl=0

The file is large because it contains the output data from several weeks of continuous processing. The programs are Elman network trainers based on various genetic algorithms. An animation of one of my experiments can be watched in:

https://www.youtube.com/watch?v=Yz1NYskvFg0

The results were disappointing, so I decided not to upload anything until now. I've uploaded this work only to keep a historical record.

A (crude) contribution to agent-based market simulation, and some ideas for the future

2015-02-22T14:15:00.001-08:00

A while ago I wrote the skeleton for an agent-based market simulator. It's basically just the market-maker with an order book and a couple of unintelligent investors. The investors just place buy and sell orders randomly, and drive the price up or down.

/*
Market simulation
<vomv1988@gmail.com>
Sat Jan 31 13:03:54 CST 2015

Based on
http://www.21stcenturyinvestoreducation.com/page/tce/courses/course-101/004/001-stock-prices.html

I wrote the following market simulator in which the market maker takes a spread of $5 from a commodity which tan take a price from $1 to $100.

The investors place random limit orders, and the market maker decides the optimal price in order to guarantee the greatest possible liquidity.
*/

‪#‎include‬ <stdio.h>
#include <stdlib.h>

int iabs(int x);

int main(){
int **orderbook;
int price, buysell;

int **orders;
int id, ordprice, order;

int pricerange = 100;
orderbook = (int **) malloc(sizeof(int *) * pricerange);

int orderslen = 1000;
orders = (int **) malloc(sizeof(int *) * orderslen);

int i;
for(i = 0; i < pricerange; i ++)
orderbook[i] = malloc(sizeof(int) * 2);

for(i = 0; i < orderslen; i ++)
orders[i] = malloc(sizeof(int) * 3);

for(i = 0; i < orderslen; i ++){
id = i;
ordprice = rand() % pricerange + 1;
order = rand() % 11 - 5;
orders[i][0] = id;
orders[i][1] = ordprice;
orders[i][2] = order;
}

int j;
for(i = 0; i < pricerange; i ++){
price = i + 1;
orderbook[i][0] = 0;
orderbook[i][1] = 0;
for(j = 0; j < orderslen; j ++)
if(orders[j][1] == price)
if((buysell = orders[j][2]) < 0)
orderbook[i][0] += buysell;
else if((buysell = orders[j][2]) > 0)
orderbook[i][1] += buysell;
}
/*
for(i = 0; i < orderslen; i ++)
printf(
"ID: %d, PRICE: %d, ORDER: %d\n",
orders[i][0],
orders[i][1],
orders[i][2]
);

for(i = 0; i < pricerange; i ++)
printf(
"PRICE: %d, SELLERS: %d, BUYERS: %d\n",
i + 1,
orderbook[i][0],
orderbook[i][1]
);
*/

int totalorders = 0;

int marketprice = 20;
int spread = 5;
int sold = 0;
int bought = 0;
int transactions;
int bestprice = 0;
int bestorders = totalorders;

int ticks = 1000;

int k;

for(j = 0; j < ticks; j ++){
totalorders = 0;
for(i = 0; i < pricerange; i ++)
totalorders += -1 * orderbook[i][0] + orderbook[i][1];

bestorders = totalorders;
bestprice = 0;

for(marketprice = 1; marketprice <= pricerange; marketprice ++){
sold = bought = 0;
for(i = 0; i < pricerange; i ++){
price = i + 1;
if(price <= marketprice + spread)
bought += orderbook[i][0];
if(price >= marketprice)
sold += orderbook[i][1];
}
transactions = bought + sold;

if(iabs(transactions) < iabs(bestorders)){
bestorders = transactions;
bestprice = marketprice;
}
/*
printf(
"MARKETPRICE: %d, BOUGHT: %d, SOLD: %d, TRANSACTIONS: %d\n",
marketprice, bought, sold, transactions
);
*/
}

printf("BESTPRICE: %d, BESTTRANS: %d\n", bestprice, bestorders);

// printf("EXECUTE ORDERS\n");
for(i = 0; i < pricerange; i ++){
price = i + 1;
if(price <= bestprice)
orderbook[i][0] = 0;
if(price >= bestprice + spread)
orderbook[i][1] = 0;
}
/*
for(i = 0; i < pricerange; i ++)
printf(
"PRICE: %d, SELLERS: %d, BUYERS: %d\n",
i + 1,
orderbook[i][0],
orderbook[i][1]
);
*/
// printf("TAKE NEW ORDERS\n");
for(i = 0; i < orderslen; i ++){
id = i;
ordprice = rand() % pricerange + 1;
order = rand() % 11 - 5;
orders[i][0] = id;
orders[i][1] = ordprice;
orders[i][2] = order;
}

for(i = 0; i < pricerange; i ++){
price = i + 1;
orderbook[i][0] = 0;
orderbook[i][1] = 0;
for(k = 0; k < orderslen; k ++)
if(orders[k][1] == price)
if((buysell = orders[k][2]) < 0 && price >= bestprice)
orderbook[i][0] += buysell;
else if((buysell = orders[k][2]) > 0 && price <= bestprice + spread)
orderbook[i][1] += buysell;
}
/*
for(i = 0; i < pricerange; i ++){
price = i + 1;
if(price >= bestprice)
orderbook[i][0] += -1 * (rand() % 10 + 1);
if(price <= bestprice + spread)
orderbook[i][1] += rand() % 10 + 1;
}

for(i = 0; i < pricerange; i ++)
printf(
"PRICE: %d, SELLERS: %d, BUYERS: %d\n",
i + 1,
orderbook[i][0],
orderbook[i][1]
);
*/
}

for(i = 0; i < pricerange; i ++)
free(orderbook[i]);

for(i = 0; i < orderslen; i ++)
free(orders[i]);

free(orderbook);
free(orders);

return 0;
}

int iabs(int x){
if(x < 0)
return -1 * x;
else
return x;
}

Here's an example of price fluctuations produced by this code:

The following tarball contains additional code examples. In some source files, the price volatility can be regulated by setting a range for order volumes. In others, there's an example of front-running price prediction.

https://www.dropbox.com/s/g01i5qntffrekpb/simcom.tar.gz?dl=0

I would have to endow the investors in the simulation with some degree of intelligence in order to make it more realistic. But even if I followed the ideas in http://jasss.soc.surrey.ac.uk/6/3/3.html, there would be some fundamental flaws.

The agent-based simulations I've seen so far seem to only take into consideration past price fluctuations to determine artificial investors' buy/sell decisions. As if, in real life, real investors only took these things into considerations in their decision to buy and sell. In reality, investors also read the news. Some may even base their investment on irrational decisions, such as fortune-telling, the zodiac, or "gut-feelings".

A better model would look something like this:

Factors colored in red are ones I haven't seen taken into account in the studies I've read so far. They also seem extremely hard to predict. I think predicting the market requires predicting things such as wold events or human history. A natural catastrophe can drive stock and currency prices down, as can a civil war.

I should add that, at this point, I'm starting to think that successful investors are more the product of chance and luck, than the product of "intelligent investing". I'm extremely skeptical a to whether a thing such as "intelligent investment" even exists, in the traditional market sense.

This also makes me think that the attempts of so-called "economic analysts" to predict the behavior of markets, is also futile. They may think they're doing science, but I think they're far from it. I think economics is in a proto-scientific stage as of now. I see no near future breakthrus in market forecasting. I'm not saying it's hopless. Just very misinformed, and young. There may be a scientific future to market forecasting, but in the far, far future.

In a practical sense, to me, this means that economics is a field much younger than it's professionals might make it seem. At least scientifically. It sounds like an exciting field to try to do science in. But I should expect no significant advances as of today. There is still too much to be found out about human behavior, in order to make economics a viable science.

I will probably be taking a break from this line of research. Perhaps, never to return. It's been a pleasant experience nevertheless.

Cleaner code and some interesting links

2015-02-22T03:03:00.001-08:00

Here's some new code I wrote. I think this is the last code I'll write for trying to predict forex trends using neural networks. It's useless for predicting forex trends, but it has some cool examples of simple recurrent neural networks in C.

https://www.dropbox.com/s/b5cxga1sthtqzds/predict.tar.gz?dl=0

By the way, the time series I used for the experiments in the tarball was taken from

http://www.ecb.europa.eu/stats/eurofxref/eurofxref-hist.xml

And here are some links with ideas related to what I mentioned in my last post about considering markets as complex systems composed of simpler, smaller parts. I've to say I do consider the examples in the links to be gross oversimplifications. But so do the authors, apparently.

http://www.altreva.com/
http://jasss.soc.surrey.ac.uk/6/3/3.html
http://pages.stern.nyu.edu/~adamodar/New_Home_Page/invfables/investorirrationality.htm

Some conclusions about the behavior of markets

2015-02-16T17:49:00.002-08:00

After a couple of months of attempting to apply my interpretation of Kuperin et al's method to the prediction of forex market trends with neural networks, I've come to a few of realizations.

I've concluded that neural networks are, in fact, insufficient for predicting market trends. I've also come to believe that technical analysis is, as a whole, just a pseudoscience. I've come to believe these things because I've noticed several recurrent flaws in the studies that presume to demonstrate the efficacy of these forecasting methods.

Take neural networks, for example. The way you test the efficacy of a neural network in predicting market trends is by taking a series of steps:

1. First, you have to choose a network architecture.
2. Then, you acquire some large dataset to train and test the neural network you picked.
3. You train your neural network using a section of this dataset.
4. You "tell" the neural network to make a prediction, and see how close the network's prediction is to the actual data, taken from the same dataset.

One problem lies in the fact that, in step 1, you can choose from an enormous set of different types of neural networks. There are infinitely many different neural networks to choose from, and some of them will perform better than others. Does finding one which produces statistically significant results for a specific dataset really mean anything?

If you choose some neural network architecture, set it's number of neurons, you tune the sigmoid function for it's neurons, and then run your tests, and it turns out that the quality of the predictions is unimpressive, what do you do? You just change the parameters, and try again. You try a different architecture, different number of neurons, alter the sigmoid, and let's say that, again, you find nothing interesting. So you keep changing the parameters until you come up with a neural network that seems to work.

But what's the actual extent of that neural network's efficacy? Maybe it only works for the dataset you're using. Maybe if you used the data of 10 years from now, the same neural network would produce bad predictions.

So you ask yourself these questions, and you decide to make another test. You train and test your neural network using data from a specific period, say, the usd/euro fluctuations from year 2006. And then you test the same network architecture, by training and testing it using data from 2007, 2008, 2009...

But even if you find a neural network that works along that whole time line, wouldn't you be falling in the same survivorship bias as before? That is: it makes no difference whether you test your network with 2006 data first, and then with 2007, 2008, 2009... data, if you're going to throw it away if it turns out that it doesn't work along the whole time-line. That's basically the same thing you did before! It seems to me that this would just be cherry-picking the "best" neural network, out of the infinite set of possible different neural networks. There's still no guarantee that, 10 years from now, that same network you picked will produce good predictions.

This is no way to do science.

In science, you should have some degree of control over the "independent variables" which affect the outcome you're attempting to predict. Technical analysis assumes that future fluctuations can be predicted, to a degree, from the information of past fluctuations in price. From technical analysis, you could consider the "independent variables" to be a set of past price fluctuations. And the "dependent variable" would be the future fluctuations, which you are trying to predict.

The assumption is that there exists some mathematical formula which takes in it's input a set of past fluctuations, and spits to it's output a set of future fluctuations. An example of this is the MACD indicator. But MACD suffers from the same survivorship bias problem I laid out before. If you intend to use MACD to predict price fluctuations, you should know that MACD uses a set of parameters. The MACD is actually "tunable". So, if your MACD starts to fail, you can always tune it to "fix it". But then, does the MACD really work at all? Or are you just fooling yourself?

Tuning the MACD, or a neural network architecture for that matter, to fit the data, is the same as changing your theory every time you're confronted with contradicting results. What if there's really no mathematical relation between the past of price fluctuations, and the future of them? I mean, you keep scrambling your formula to try to come up with one that produces good predictions. But if the fluctuations are random, your formula may just be some locally optimal formula that works for your particular dataset. Again: no guarantee that it'll work for future datasets.

You really have no control over the behavior of markets in order to do the kinds of experiments required to extract underlying principles for their behavior. You may think that using different datasets to test your formula is a sort of "control" over the input parameters for the formula. That is: you can use a different set of inputs and outputs, to see if your formula works there too. But you're not controlling the underlying causes that made the data be the way it is. You're just hoping to find some periodicity in a time series. This is not at all the same as stimulating a system (i.e., the market) to find causal relationships.

Maybe there IS a formula that can predict market fluctuations. But I think the correct way to construct this formula, is not by scrambling the weights in a neural network, or the parameters in a MACD, or any other technical analysis indicator. The correct formula should be based on principles extracted from the behavior of economic agents. From an accurate and tested model of the behavior of investors.

I think the correct approach would be to think of markets in the way we think about the weather. People who study the behavior of weather apply the principles of the dynamics of fluids to the complex system that is the weather. That is: principles which have been extracted from controlled lab experiments, where we can actually control the independent variables which affect the physical behaviors of simple components of the weather, such as the air, or water.

The way to construct a correct mathematical model for the behavior of markets would be to extract the principles which govern their simplest components. In the case of markets, the components would be behavior of individual investors, the behavior of companies, and societies, and even the behavior of natural catastrophes or events which might influence an investor's decision to buy or sell. The market is, in fact, an enormously complex system, which involves all these things, and maybe more.

Using something like a neural network with 200 neurons to try to predict the fluctuations produced by such a beast seems to me like a long shot. The correct model would have to be, from my point of view, in all likelihood, much, much more complex than that. Not to mention that the very methodology of using past fluctuation data to try to predict future fluctuations is just wrong. Trying to find periodicity is not the same as modeling.

An example of causal relationships in markets can be learned, for example, from "front running". Front running is a kind of market fraud that takes advantage of a particular market inefficiency which guarantees a rise or fall in price. So people who invest based on front-running actually KNOW that the price is going to fall or rise. The fact that it's documented as a (mostly) certain way of predicting the market which, because of this, is illegal, is proof that there exist basic principles which govern the markets, and allow people to make accurate price predictions. I'm not saying that we should start a front-running scheme. I'm just saying that the underlying principles exist, and they've even been documented and classified as "cheating". Maybe it's possible to extract some other principles, through observation or experiments, in order to produce a better predictive model for price fluctuations.

Bug found in standard deviation calculation.

2015-01-21T14:29:00.000-08:00

I just corrected a bug in musigma.sh: the actual Z-scores for the euro predictor are actually around +0.7, not +1.5 as I had thought. It may still be significant. Especially since the histogram for the Z-scores of the 31 neural networks shows a bell curve with a mean close to +0.7. That is: not just some ugly distribution.

Here's the corrected version:

https://www.dropbox.com/s/itckgrnq77tj7fo/elm.tar.gz?dl=0

Some updates in the code

2015-01-17T12:21:00.001-08:00

Still very messy. But with some upgrades.
https://www.dropbox.com/s/azs4nwwdasvyyye/elm17.tar.gz?dl=0

Let's play a game

2015-01-12T14:56:00.002-08:00

Let's say I have a series of values. The series consists of 4000 values. The first few values are:

1) 1.1789
2) 1.179
3) 1.1743
4) 1.1632
5) 1.1659
6) 1.1569
7) 1.152
...

Now, let's say I show you the first 99 values in the series, and ask you to tell me if the 100th value will be greater than, equal to or less than the 99th value. You can only take a look at the 100th value once you've guessed, and you earn one point every time you are correct in your prediction.

1) 1.1789
2) 1.179
3) 1.1743
...
98) 1.0634
99) 1.0639
100) ???
...

To make things simpler, you will be given only two choices:

a) The 100th value is greater than the 99th value
b) The 100th value is equal to, or less than, the 99th value

Let's say you guessed b) this time around. Ok, now let's reveal the 100th value to see if you were correct:

...
99) 1.0639
100) 1.0572
...

You were right! The 100th value is 1.0572, which is less than 1.0639 (the 99th value). Therefor, b) is the correct answer.

We can keep playing this game using the values from 2 to 100, and trying to guess a) or b) for the 101th value, then taking values 3 to 101, and guessing for 102, and so on until we reach the end of the series.

To sum things up: the game consists of guessing whether the value will go up, down, or stay the same at any point in the series. The rule is that, to make your guess, you are only allowed to look at the 99 values previous to the point you choose in the series.

How can you use the information from the 99 previous values to guess the direction of the next one?

Well, here's an idea: first let's take the first few values from before.

1) 1.1789
2) 1.179
3) 1.1743
4) 1.1632
5) 1.1659
6) 1.1569
7) 1.152
...

Now, let's pay attention to the direction of the value, from one value to the next:

From 1) to 2), the value went up.
From 2) ro 3), the value went down.
From 3) to 4), the value went down.
From 4) to 5), the value went up.
...

And so on. We can create a new series using the letters a) and b) from our game. We can use the letter a) when the value goes up, and the letter b) when the value stays the same, or goes down. Our new series will look something like this:

1 to 2) a
2 to 3) b
3 to 4) b
4 to 5) a
...

Etcetera. Our new series is composed only of the letters a) and b). Notice that, if we take 99 values to produce a series like this one, our new series of a)'s and b)'s will contain 98 letters.

Let's suppose that we've counted how many a)'s and how many b)'s there are in our new series of 98 letters. Let's also suppose that, in our new series, there are more a)'s than b)'s. By looking at the previous 99 values, and how there are more increments (a) than decrements (b), we could assume that there is an "upwards trend". That is: up to this point in the series, the value seems to increase more than it tends to decrease.

This may lead us to think the 99th value is more likely to increase, than to decrease or stay the same. That is: the 100th value is more likely to be greater than the 99th value. We may be led to think that a) is the correct answer. But we may be wrong in our guess this time around. And in fact, in this instance, we would be.

But what if we keep playing the game using this method?

Let's say we play the game again, using the same guessing method, but this time, using values from 2 to 100 to guess a) or b) for value 101. Then, let's take 3 to 101, and guess for 102, then 4 to 102 and guess for 103... Let's say we keep going like this, until we've played the game 3000 times.

What would our overall score be with our guessing method? Maybe we would guess correctly 1500 times out of 3000, maybe just 1000/3000, or, if we're lucky, 2000/3000!

How can we know if our method is any better than, for example, just flipping a coin to choose from a) or b) each time? Could a sequence of 3000 coin flips produce a lucky score of 2000/3000 or better? If we repeat the same experiment over and over, how often would 3000 coin flips produce such a score? Would counting the a)'s and b)'s for previous values really be any better?

We can use statistics and probability to find out. Let's start by making a score history for the 3000 times we played the game. This history may look like this:

From 99 to 100) The answer was a), and you guessed b). You earn 0 points.
From 100 to 101) The answer was b), and you guessed b). You earn 1 point.
From 101 to 102) The answer was b), and you guessed a). You earn 0 points.
...
From 3098 to 3099) The answer was a), and you guessed a). You earn 1 point.

Notice that there are three lists here: the list of "correct answers", the list of "guesses", and the list of "points earned". Each of these lists has 3000 elements, one for each time we played the game. We can make a table with these three lists:

Correct answers	Guesses	Points earned
a	b	0
b	b	1
b	a	0

...

As you can see, you only earn one point if your guess matches the correct answer. Let's say we know how many a)'s and b)'s there are in the "correct answers" list. We can estimate the probability of any single element of the list being a) by simply dividing the amount of a)'s by 3000. The same may be applied to the probability of b) in that list.

Let's use the following names for the probabilities:

We will use Pc(a), for the probability of any single element of the "correct answers" list being a)
And Pc(b), for the probability of any single element of the "correct answers" list being b)

Now, what about the list of "guesses"? If we're guessing by coin flips, then the probability of guessing a) would be 1/2, and thus, the probability of guessig b) would also be 1/2. That is: the probability of any single element of the "guesses" list being a) would be 1/2; as would be the probability of it being b)

Let's also use names here:

We will use Pg(a), for the probability of any single element of the "guesses" list being a)
And Pg(b), for the probability of any single element of the "guesses" list being b)

Given these two pairs of probabilities, what is the probability of any given element of the "earned scores" list being 1? And what is the probability of it being 0? We can use the addition and multiplication rules of probability to determine this. Remember that you only get one point if your guess matches the correct answer. So the probability of earning 1 point would be:

Pp(1) = ( Pc(a) * Pg(a) ) + ( Pc(b) * Pg(b) )

Similarly, the probability of earning 0 points would be

Pp(0) = ( Pc(a) * Pg(b) ) + ( Pc(b) * Pg(a) )

Using these calculations, we can know how likely it would be for us to earn one point by flipping a coin in any single game out of the 3000, without even having to flip the coin 3000 times!

Let's make an example by using actual probability values. Let's say that, in our "correct answers" list, there are 1575 a)'s, and 1425 b)'s. This leaves us with:

Pc(a) = 1575 / 3000 = 0.525
Pc(b) = 1425 / 3000 = 0.475

If we use the coin-flipping method to make our guesses, the probability of succeeding in any single game out of the 3000 would be:

Pp(1) = ( 0.525 * (1/2) ) + ( 0.475 * (1/2) ) = 0.5

We have a 50% chance of earning 1 point for each one of the 3000 games. This means that, in the end, we will probably only win half of the times we play. This gives us a likely final score of 3000 * 0.5 = 1500 points, for the coin-flipping method.

But if we were to actually carry out the experiment in real life, we would notice that getting exactly 1500 points each time is actually not very likely! Why is this so? Well, in real life, probability is only an estimation. Even if you have a perfect coin, with exactly 1/2 of probability for a) and b), there is still a possibility that, in the 3000 tosses, the coin will guess a) more often than b), or vice versa.

Likewise, there is a possibility that you will get all 1's in the "earned points" list, by just flipping coins! How likely is this possibility? Not very. Let's see why...

Let's ask ourselves the following question: how many different patterns of 0's and 1's are possible for a list of 3000 0's and 1's? If you can count in binary, you will notice that we can consider the "eraned points" list as just a binary number with 3000 bits. If we count in binary from 000...0 to 111...1, for a 3000-bit long number, we would get 2³⁰⁰⁰ different binary patterns. Those are a lot of different patterns. Those are 2³⁰⁰⁰ different ways in which the "earned points" list may result in the end.

Now, let's ask ourselves this: how many of these 2³⁰⁰⁰ possible binary patterns have all 1's? Only one! The last one. The rest of them have at lease one zero.

This is how we can know exactly how small the probability of getting all 1's really is. Note that every single possible pattern for the "earned points" list is just as likely to appear as any other (because Pp(0) = Pp(1)). Therefor, a perfect score has a probability of only

Ps(3000) = (1 / 2³⁰⁰⁰)

of ever appearing by coin-flipping.

Now, a more complicated question: how likely is it to get a score of 2/3000 by coin-flipping? We would have to count the amount of patterns, in the set of 2³⁰⁰⁰ possible patterns, which have 2 ones and 2998 zeros. Fortunately for us, there exists a formula which tells us exactly how many there are. It's called the formula for "permutations with repetition of indistinguishable objects". For simplicity's sake, I will not go into the details of using this formula here. For now, let's just assume it works. If you wish, you may google it later.

The way to calculate the probability for any particular final score X would be to:

1. Obtain the amount of different binary patterns with X ones and (3000 - X) zeros using the formula for "permutations with repetition of indistinguishable objects"

2. Divide that number by 2³⁰⁰⁰

The point of all this is that we can use these formulas to know the probability of getting any single final score by coin-flipping. If we were to calculate the complete list of these probabilities, for scores 0 through 3000, we would end up with a "binomial probability distribution".

In our example, the highest probabilities would be distributed around a final score of 1500. Higher or lower scores would be ever less probable. This would account for the way in which we get different scores every time we play the game 3000 times with the coin-flipping method. Most of these times, our final score will be close to 1500. Altough a fluke of, for example 2923, is always possible, it is also very, very, very unlikely.

How then, does this help us to determine whether our a)/b)-counting method is any better than coin-flipping? We can tell exactly how better (or worse) counting is to coin-flipping by looking at how far apart the counting score is from 1500.

Remember that the highest probabilities are distributed around a final score of 1500 for coin-flipping. If we get a score of 1582 by using the counting method, then we should ask ourselves: how likely would it have been to get that same score, by just coin-flipping?

If the difference between the likely coin-flipping score (1500) and the counting score (1582) is big enough, then we can tell that our counting method is superior to just coin-flipping. The question now is: how big is big enough?

It turns out there is a standard way of measuring this difference. The standard measurement of this difference is called the "Z-score", and it's unit is the "sigma".

It turns out that, for the binomial distribution obtained from the coin-flipping method, most of the probability will be concentrated around +/-3 sigmas from the mean. The mean being 1500 for this case.

Any counting score which is around +3 sigmas, or more, away from the mean, can be considered vastly superior to coin-flipping. This is because the binomial probability distribution of mere coin-flipping would make it almost impossible to reach the same score with coin-flipping.

But, how can we know whether our specific score of 1582 is +3 sigmas away from the mean of 1500? All we have to do is standardize. For this, we will use the formulas for the mean and variance used in binomially distributed probability.

For our particular case:

n = number of experiments (3000)
p = probability of success (Pp(1) = 0.5)

The mean is:

m = p * n = 0.5 * 3000 = 1500

And the variance is

v = p * n * (1 - p) = 0.5 * 3000 * 0.5 = 750

The Z-score for a final score x = 1582 would then be:

z = (x - m) / sqrt(v) = (1582 - 1500) / 27.38 = 2.92

A score of 1582 is +2.92 sigmas away from the mean of 1500! This would be large enough to be coinsidered "statistically significant". That is: far, far better than just guessing by coin-flipping.

Now for the good part...

Let's say the 4000 values from the original list are actually daily currency exchange rates for the EUR_USD pair. The previous method could help us in finding out if a neural-network-based price direction predictor is any better than a merely random price direction predictor.

The exact same comparison would apply. But instead of using the a)/b)-counting method, a neural network would be used.

In fact, I've already used a type of Simple Recurrent Network called the Elman Network to play the exact same game. By training an Elman network with a simple greedy algorithm using the 99 previous values mentioned in the game, I have been able to obtain Z-scores of +1.5 in average. I should mention that feeding random series to the network results in much lower Z-scores. That is: the network can tell the difference between artificial, random series created by me, and series which come from real FOREX market history.

A very messy, preliminary version, of the source code I've used for this can be downloaded from here. I must warn you that the code is extremely messy, and provided as-is, without any instructions. The results I mentioned are perfectly obtainable using this software, but you must know how to compile and run the programs. You may be able to do this on your own, but I encourage you to e-mail me with any questions.

LEGAL NOTICE: As I am currently unemployed and attempting to make my way to an Erasmus Mundus scholarship (or any other professional opportunity available to me), I will very much appreciate it if you ask me for permission before using my work in any way or form. If you are a scientist, academician or otherwise professional, interested in my work, please e-mail me at: vomv1988@gmail.com

Cheers!

Some source code for the previous post

2014-10-07T14:28:00.000-07:00

This is the RNN and the data sets from the previous post about the RNN target data fitting.

https://www.dropbox.com/s/blv7zci9p37gvy4/eurusd4.tar.gz?dl=0

https://www.youtube.com/watch?v=ShdmErv5jvs

2014-10-07T14:13:00.000-07:00

4dalulz
https://www.youtube.com/watch?v=ShdmErv5jvs

Fitting Pybrain's RNN prediction

2014-10-07T13:34:00.001-07:00

After fiddling with some parameters in my original pybrain RNN, (such as the number of neurons, the training data set size and the normalization factor for the target data set) I've been able to produce RNN predictions which fit the shape of the target set, but not it's amplitude, or horizontal and vertical positions.

The following picture illustrates this problem. The blue signal is the RNN target set. The green signal is the raw RNN output for the training set input. The red signal is a crude attempt at fitting the green signal to the blue signal. It's merely a vertical displacement, together with an increase in amplitude, of the raw green signal.

Notice also how the red signal seems to be shifted to the right of the blue one. The following picture shows the prediction for 25 days after the last date in the training set. It has the same amplitude and vertical shifts as the previous picture, but I've added a left-wise horizontal shift of two days to this one.

I think I could get the red and blue signals pretty close with a simulated annealing algorithm which finds the best horizontal, vertical and amplitude shifts for the original raw green signal. I would have to use only the training set information for this, in order to make it useful for real-life prediction.

I think I'm getting pretty close to something useful here. I'll be posting some source code later.

First attempt at implementing recurrent neural network FOREX trend prediction

2014-09-24T19:53:00.001-07:00

My first attempts at implementing the experiment from the russian paper with neurolab failed miserably. It just doesn't have all the options I needed. I've decided to switch to PyBrain (https://github.com/pybrain/pybrain/wiki/installation). Note that I've only worked with the EUR/USD trading rate.

Although the results were better, they were not satisfactory. The following plot shows how close the Elman RNN got to the normalized training set. After around 1500 epochs, PyBrain couldn't get any closer. Also, the quality seems to depend heavily on the initial random values of the net.

I did not follow the paper closely. An important step I missed was using the modified logistic function. I tried modifying it in the file named /usr/local/lib/python2.7/dist-packages/PyBrain-0.3.1-py2.7.egg/pybrain/tools/functions.py, but PyBrain's training function failed after the modification.

If you would like to try it yourself, you can find the source code and all the instructions here:

https://www.dropbox.com/s/3v63go1zlfux9l5/euro2.tar.gz?dl=0

All you have to do is install the dependencies and run a couple of shell scripts. The scripts will download the data directly from the european central bank, create and train the RNN, and generate the plots automatically.

My next steps will be to: 1. write a RNN builder and trainer in C from scratch, and 2. try to predict only return rate direction (positive or negative) with PyBrain.

Considering an ARIMA model

2014-08-31T20:39:00.003-07:00

I came across a blog post and a paper describing methods for implementing an ARIMA model as a price forecaster. They both deal with the stock market, which is not exactly the same as FOREX. But given the lack of ARIMA examples involving FOREX, I've decided to take a look at these:

http://programming-r-pro-bro.blogspot.mx/2013/04/forecasting-stock-returns-using-arima.html
http://www.hindawi.com/journals/jam/2014/614342/

From the blog post it seems that ARIMA doesn't have a very good resolution in it's predictions. The most it seems to be able to predict is a likely dynamic range for the future return rate. The paper shows how an ANN predicts fluctuations more closely. Still: this paper uses a FFNN, whereas the russian paper from one of my earlier posts uses a SRN.

ARIMA seems more limited than the SRN in it's predictive power. Therefor, I have decided to give priority to the SRN approcah using the newelm function from neurolab which I mentioned in an earlier post.

The ARIMA implementation, nevertheless, has inspired me to attempt a new rustic buy-and-hold algorithm. One which takes into account the density of price falls through time, and a likely dynamic range for it. I will be reporting on the results later on.

Kalman filter disappointment

2014-08-31T20:32:00.002-07:00

I found two identical implementations for a univariate Kalman filter:

http://greg.czerniak.info/guides/kalman1/
http://bilgin.esme.org/BitsBytes/KalmanFilterforDummies.aspx

From the formulae for K_k and P_k, it seems that all this Kalman filter does is to damp the signal as K_k simply decreases with each iteration. Which is not very impressive.

I wasn't able to extract any "predictive" component from this implementation of the Kalman filter. I've decided to desist with this approach.

Using a Kalman filter to predict short-term currency price fluctuation

2014-08-28T15:03:00.001-07:00

In the previous post I showed that price fluctuations have a gaussian distribution. This has a similarity with gaussian noise. Apparently, there are several ways to predict gaussian time series. One is described here:

http://www.gaussianprocess.org/gpml/chapters/RW.pdf

But it involves concepts which are too advanced and complicated for me at this moment. Besides: gaussian processes are not the same as gaussian noise. It seems that a gaussian process is formally defined as a process involving "multivariate normal distribution". Since the price fluctuations I'm studying have only one variable, I believe using gaussian process prediction methods would be overkill. I might be wrong, though.

Another method is one used to filter out gaussian noise. It's called "Kalman filtering". Kalman filtering seems useful, because it works by predicting gaussian noise in order to eliminate it. Because currency price fluctuations follow a gaussian distribution, I think the predictive component of the Kalman filter may be useful in predicting short-term future fluctuations in currency price.

The following seems like a good, comprehensive, introduction to Kalman filtering:

http://www.cs.unc.edu/~tracker/media/pdf/SIGGRAPH2001_CoursePack_08.pdf

So the basic idea is to treat FOREX price fluctuation as if it were gaussian noise, and try to predict it short-term with a Kalman filter.

I want to be able to predict short-term future price fluctuations because I discovered that sudden and large price decrements produce important losses when using my rustic buy-and-hold algorithm. These predictions may turn out to be useful in setting up an effective predictive stop-loss alarm for my rustic buy-and-hold algorithm.

Found normal distributions in price changes and price decrement frequencies

2014-08-24T02:43:00.000-07:00

I've been measuring the distribution of negative fluctuations in forex market history for the EUR_USD. I discovered that most variations in price (from one instant to another, with a time base of 1 minute) are "small", when compared to larger ones. Sudden, large increments or decrements are scarce. The following plot shows the data from sorted (smallest to largest) price changes (both positive and negative) along many weeks (each week has a different color).

The plot shows a large amount of "small" values near the center (which is zero), and only a few large ones near the sides. This implies that a lapse in time with a high density of price decrements has more probability of containing large negative price fluctuations than a lapse with few price decrements.

To extract the frequency of decrements at any point in time, only the direction of the currency price is required. The following picture shows price decrements in red, and increments in green. Blank spaces represent unknown currency fluctuations in a given instant. What I try to find here is how many red squares there are within a given window. Then I shift that window to the right, and count the red squares again. I keep shifting and counting until I reach the end of the data set.

Interestingly, it turns out the set of frequencies (sampled by shifting the window from top-left to bottom-right) also have a normal distribution. This means there are more "small" frequencies than "big" ones in all consecutive windows of time.

Plotting the histogram for both data-sets showed an approximate normal (Gaussian) distribution.

This shows that the variations in price are mostly smooth, with occasional outbursts. That is: they are more likely to fluctuate within a given (relatively) small range at any given instant. There is a certain stability in currency price change.

First attempt at a rustic buy-and-hold forex bot

2014-08-24T02:14:00.000-07:00

I've achieved detecting the beginning of tiny price jumps. Maybe if I tune the program's parameters I can make it find the beginning of larger jumps.

Here's the source code. It's a little rough, fyi:

 /*  
      Rustic buy-and-hold  
      Requires fine tuning. Perhaps make it adaptive.  
   
      Vicente Oscar Mier Vela  
      <vomv1988@gmail.com>  
   
      Example  
      $ cat samples_1407099600_1407531600_M1.dat | ./t6 1407099600 1407531600 60 200 20 20  
   
      Example of samples_START_END_TB.dat  
      $ cat samples_1407099600_1407531600_M1.dat | head  
      1407099600  
      1.34247  
      1.34359  
      1407099660  
      1.34285  
      1.34326  
      1407099720  
      1.34291  
      1.34331  
      1407099780  
   
      This one trys to link density of losses with absolute losses using  
      local average minimum / maximum algorithm.  
 */  
 #include <stdio.h>  
 #include <stdlib.h>  
 #include <math.h>  
   
 #define closeBid 0  
 #define closeAsk 1  
   
 int cmp(const void *x, const void *y);  
   
 int main(int argc, char *argv[]){  
      int startdate = atoi(argv[1]);  
      int enddate = atoi(argv[2]);  
      int timebase = atoi(argv[3]);  
   
      int nsamples = (enddate - startdate) / timebase;  
      int expdate;  
      int currdate;  
   
      double *samples[2];  
      samples[closeBid] = malloc(sizeof(double)*nsamples);  
      samples[closeAsk] = malloc(sizeof(double)*nsamples);  
   
      int i;  
      for(i = 0, expdate = startdate; i < nsamples; i ++, expdate += timebase){  
           scanf("%d", &currdate);  
 /*  
           The following assumes that the first date from the  
           dataset will always yield a value. That is: 1st date is  
           always "ticked".  
 */  
           if(currdate == expdate){  
                scanf("%lf", samples[closeBid] + i);  
                scanf("%lf", samples[closeAsk] + i);  
           } else {  
                while(currdate != expdate && i < nsamples){  
                     samples[closeBid][i] = samples[closeBid][i-1];  
                     samples[closeAsk][i] = samples[closeAsk][i-1];  
                     expdate += timebase;  
                     i++;  
                }  
                if(i < nsamples){  
                     scanf("%lf", samples[closeBid] + i);  
                     scanf("%lf", samples[closeAsk] + i);  
                }  
           }  
      }  
   
      int *diffs = (int *) malloc(sizeof(int) * nsamples);  
      diffs[0] = 0;  
      for(i = 1; i < nsamples; i ++)  
           if(samples[closeBid][i] - samples[closeBid][i - 1] < 0)  
                diffs[i] = -1;  
           else if(samples[closeBid][i] - samples[closeBid][i - 1] > 0)  
                diffs[i] = 1;  
           else  
                diffs[i] = 0;  
   
      int *diffs2 = (int *) malloc(sizeof(int) * nsamples);  
      int k;  
      int range = 10;  
      for(i = 0; i < range; i ++)  
           diffs2[i] = 0;  
      for(i = nsamples; i > nsamples - range; i --)  
           diffs2[i] = 0;  
   
      for(i = range; i < nsamples - range; i ++)  
           for(k = range * -1 ; k < 0 ; k ++){  
                if(diffs[i + k] == -1)  
                     diffs2[i] ++;  
           }  
   
      for(i = 0; i < nsamples; i ++)  
           printf("%d >> %d\n", diffs2[i], diffs[i]);  
   
      int j, t = atoi(argv[4]), mxs = atoi(argv[5]), mns = atoi(argv[6]);  
      double *s = (double *) malloc(sizeof(double) * t);  
      double *q;  
      double sum, loc_avg_min = 0, loc_avg_max = 0, bal = 0;  
      int flag = 0;  
      for(i = 0; i + t < nsamples; i ++){  
           q = samples[closeBid] + i;  
   
           for(j=0;j<t;j++)  
                s[j] = q[j];  
   
           qsort(s, t, sizeof(double), cmp);  
   
           for(j = 0, sum = 0; j < mns; sum += s[j], j ++);  
           loc_avg_min = sum / (double) mns;  
   
           for(j = t - 1, sum = 0; j >= t - mxs; sum += s[j], j --);  
           loc_avg_max = sum / (double) mxs;  
   
           if(flag == 0 && samples[closeBid][i + t - 1] <= loc_avg_min){  
                bal -= samples[closeAsk][i + t - 1];  
                flag = 1;  
           }  
   
           if(flag == 1 && samples[closeBid][i + t - 1] >= loc_avg_max){  
                bal += samples[closeBid][i + t - 1];  
                flag = 0;  
                printf("%f %d\n", bal, diffs2[i + t - 1]);  
           }  
      }  
   
      free(samples[closeBid]);  
      free(samples[closeAsk]);  
      free(s);  
      free(diffs);  
      free(diffs2);  
   
      return 0;  
 }  
   
 int cmp(const void *x, const void *y){  
      double xx = *(double*)x, yy = *(double*)y;  
      if (xx < yy) return -1;  
      if (xx > yy) return 1;  
      return 0;  
 }

Idea for a very rustic buy-and-hold algorithm

2014-08-20T16:28:00.002-07:00

From NN's to buy-and-hold. From the relatively complex, to the very simple.

In these last few days I've been imagining a new (?) algorithm for doing simple buy-and-hold trades. The key is in finding the most probable highest and lowest currency prices for a given point in time. It would involve something like taking a list of the last N prices, sorting it, and averaging the top M values to get some "local average maximum" and the last L to get a "local average minimum". Then, placing limit orders with those values. Come to think of it, it sounds a little like RSI's overbought and oversold indicators. But it's not quite it.

Pretty crude. But I'll do it while I work on the NN approach, and see what happens.

Second thoughts about tuning the MACD with a GA

2014-08-20T15:42:00.000-07:00

Let's say that I find the most profitable MACD parameters for a certain month or year. This is no guarantee that the same parameters will be profitable next year or month. I think trying to find optimal MACD parameters for a particular dataset would be curve-fitting. I recently found a message board post about skepticism towards technical analysis. It's from a trusted source (the James Randi website [atheist/skeptic/critical thinker here BTW]). Do take a look at the references mentioned by the posters, though:

http://forums.randi.org/showthread.php?t=96372

That said, I've come across a paper which says neural networks do a good job predicting forex market trends (http://arxiv.org/pdf/cond-mat/0304469.pdf). It uses a neural network architecture which is a mix between an Elman and a Joran SRN. I believe it doesn't say what training algorithm they used to teach the network to predict market trends. In any case, I will probably be using RTRL, because it seems less resource-consuming than BPTT. I doubt my crappy computer can handle BPTT for the large amounts of data I plan to feed my SRN. Also, I would like to start with a pure Elman architecture, instead of the "Elman-Jordan" architecture suggested in the paper. I just don't have the expertise in NN's to copy the paper step by step.

I should mention I've never implemented a FFN, much less a SRN. After spending several days looking into the details of how FFN's and SRN's work, I've come up with a new TODO list:

Understand how FFN's work (check)
Understand how SRN's work (check)
Understand the backpropagation algorithm for FFN's
Understand the BPTT algorithm for SRN's
Understand the RTRL algorithm for SRN's

It would seem one can't understand RTRL without first understanding BPTT and the classic BP algorithm, as they are somehow part of RTRL. Fortunately, there are several youtube videos on all of them, as well as some pretty good online resources with worked examples:

https://www.youtube.com/watch?v=yecGyZFyfbQ

https://www.youtube.com/watch?v=hYenZlvBwr4

http://neuralnetworksanddeeplearning.com/chap1.html

From what I've gathered so far, the NN has to be a SRN because those are the ones useful for predicting time series. FFN's are useless for that. But that's all I know so far. I'm still grasping the BP algorithm, in order to implement it.

Also, I've found some interesting NN resources. The only library I've found (and liked) which includes an Elman network generator and trainer is "neurolab" for python:

http://code.google.com/p/neurolab/source/browse/trunk/example/newelm.py?r=26

I like the fact that it's in python because it would allow me to integrate it easily with a shell script. Still: I'd like to implement my own SRN in C. And also: I haven't found out whether neurolab's newelm uses BPTT or RTRL. I don't have enough knowledge about python to re-program the whole learning function for Elman, so I'm probably better off writing everything from scratch in C.

A final word about technical analysis though... I'm not sure if NN's fit into the "technical analysis" category. Maybe not ALL technical analysis is fake. In the Randi boards, they mostly mention the typical indicators: MACD, RSI, CCI, etc. Also, the mathematical analysis in the SRN paper from earlier goes pretty deep into statistics. I should be looking into that too. Grasping the statistics surrounding FOREX seems worthwhile.

So who knows... I guess I'll wait and see what my own experiments tell me.

MACD fitness function for the GA

2014-08-17T19:48:00.001-07:00

I have finished programming the MACD fitness function in C. This function takes in the 3 usual parameters for the MACD and returns the balance after the last sell. This function will be useful in implementing the genetic algorithm (GA).

In the following source code, macdbal() is this fitness function. The program works by filling in all the missing candles from the data downloaded by the dl2.sh script from an earlier post. It then calculates the Simple Moving Averages used to obtain the MACD. Based on the MACD, macdbal() "buys" (subtracts the closing ask price from the balance) and "sells" (adds the closing bid price to the balance).

 /*  
      MACD fitness function  
      Vicente Oscar Mier Vela  
      <vomv1988@gmail.com>  
   
      Use the output of "Oanda 5000 candle limit bypasser" script  
      as input for this program.  
   
      Example:  
           $ ./dl2.sh "Aug 3 21:00:00 GMT 2014" "Aug 8 21:00:00 GMT 2014" M1 > 5KOUT  
           $ cat 5KOUT | ./gen2 1407099600 1407531600 60 17 91 99  
   
      The output from above should be:  
           Success rate 29.411765%  
           Final balance: 0.001420  
   
      The first two arguments of gen2.c are the dates used for dl2.sh in UNIX time format.  
   
      For example, use:  
           $ date -d "Aug 3 21:00:00 GMT 2014" +%s  
   
      to obtain  
           1407099600  
 */  
   
 #include <stdio.h>  
 #include <stdlib.h>  
   
 #define closeBid 0  
 #define closeAsk 1  
   
 double macdbal(int emalow, int emahigh, int emasignal, double *closeask, double *closebid, int length);  
   
 int main(int argc, char *argv[]){  
      int startdate = atoi(argv[1]);  
      int enddate = atoi(argv[2]);  
      int timebase = atoi(argv[3]);  
   
      int emalow = atoi(argv[4]);  
      int emahigh = atoi(argv[5]);  
      int emasignal = atoi(argv[6]);  
   
      int nsamples = (enddate - startdate) / timebase;  
      int expdate;  
      int currdate;  
   
      double *samples[2];  
      samples[closeBid] = malloc(sizeof(double)*nsamples);  
      samples[closeAsk] = malloc(sizeof(double)*nsamples);  
   
      int i;  
      for(i = 0, expdate = startdate; i < nsamples; i ++, expdate += timebase){  
           scanf("%d", &currdate);  
 /*  
           The following assumes that the first date from the  
           dataset will always yield a value. That is: 1st date is  
           always "ticked".  
 */  
           if(currdate == expdate){  
                scanf("%lf", samples[closeBid] + i);  
                scanf("%lf", samples[closeAsk] + i);  
           } else {  
                while(currdate != expdate && i < nsamples){  
                     samples[closeBid][i] = samples[closeBid][i-1];  
                     samples[closeAsk][i] = samples[closeAsk][i-1];  
                     expdate += timebase;  
                     i++;  
                }  
                if(i < nsamples){  
                     scanf("%lf", samples[closeBid] + i);  
                     scanf("%lf", samples[closeAsk] + i);  
                }  
           }  
      }  
   
      double x = macdbal(  
                emalow, emahigh, emasignal,  
                samples[closeAsk], samples[closeBid],  
                nsamples  
      );  
   
      printf("Final balance: %f\n",x);  
   
      free(samples[closeBid]);  
      free(samples[closeAsk]);  
   
      return 0;  
 }  
   
 double macdbal(int emalow, int emahigh, int emasignal, double *closeask, double *closebid, int length){  
      int i;  
      double *emalows = malloc(sizeof(double)*length);  
      double *emahighs = malloc(sizeof(double)*length);  
      double *emasignals = malloc(sizeof(double)*length);  
   
      for(i = 0; i < length; i ++){  
           scanf("%lf", closebid + i);  
           emalows[i] = emahighs[i] = emasignals[i] = 0;  
      }  
   
      double sum;  
      for(i = 0, sum = 0; i < emalow; i ++){  
           emalows[i] = closebid[0];  
           sum += closebid[i];  
      }  
   
      emalows[emalow - 1] = sum / ((double) emalow);  
      double lowmult = 2.0 / (((double) emalow) + 1.0);  
      emalows[emalow] = (closebid[emalow] - emalows[emalow - 1]) * lowmult + emalows[emalow - 1];  
   
      for(i = emalow + 1; i < length; i ++)  
           emalows[i] = (closebid[i] - emalows[i - 1]) * lowmult + emalows[i - 1];  
   
      for(i = 0, sum = 0; i < emahigh; i ++){  
           emahighs[i] = closebid[0];  
           sum += closebid[i];  
      }  
      emahighs[emahigh - 1] = sum / ((double) emahigh);  
      double highmult = 2.0 / (((double) emahigh) + 1.0);  
      emahighs[emahigh] = (closebid[emahigh] - emahighs[emahigh - 1]) * highmult + emahighs[emahigh - 1];  
   
      for(i = emahigh + 1; i < length; i ++)  
           emahighs[i] = (closebid[i] - emahighs[i - 1]) * highmult + emahighs[i - 1];  
   
      int signalstart = emahigh * 2;  
      for(i = 0; i < signalstart + emasignal; i ++)  
           emasignals[i] = 0;  
   
      for(i = signalstart, sum = 0; i < signalstart + emasignal; i ++)  
           sum += emalows[i] - emahighs[i];  
      emasignals[signalstart + emasignal - 1] = sum / ((double) emasignal);  
      double signalmult = 2.0 / (((double) emasignal) + 1.0);  
      emasignals[signalstart + emasignal] =  
           (  
                (emalows[signalstart + emasignal] - emahighs[signalstart + emasignal]) -  
                emasignals[signalstart + emasignal - 1]  
           ) * signalmult + emasignals[signalstart + emasignal - 1]  
      ;  
   
      for(i = signalstart + emasignal + 1; i < length; i ++)  
           emasignals[i] =  
                ((emalows[i] - emahighs[i]) - emasignals[i - 1]) * signalmult + emasignals[i - 1]  
           ;  
   
      double last = 0;  
      double balance = 0;  
      double prevbal = 0;  
      double lastsell = 0;  
      double total = 0;  
      double success = 0;  
      for(i = signalstart + emasignal; i < length; i ++){  
           if(emasignals[i] > 0 && last < 0){  
                last = emasignals[i];  
                prevbal = balance;  
                balance -= closeask[i];  
           }  
           if(emasignals[i] < 0){  
                if(last == 0){  
 /* memo to myself: get ready to buy before starting this loop */  
                     last = emasignals[i];  
                }  
                if(last > 0){  
                     last = emasignals[i];  
                     prevbal = balance;  
                     balance += closebid[i];  
                     if(lastsell < balance)  
                          success ++;  
                     total ++;  
                     lastsell = balance;  
                }  
           }  
      }  
   
 /* remove this for genetic algorithm */  
   
      printf("Success rate %f%\n",100*(success/total));  
   
      free(emalows);  
      free(emahighs);  
      free(emasignals);  
   
 /* return balance after last sell */  
      if(last > 0)  
           return prevbal;  
      else  
           return balance;  
   
 }

Bypassing OANDA's 5000 candle limit

2014-08-13T17:48:00.000-07:00

OANDA limits the amount of candles you can download using it's HTTP API. If you ever try to download more than 5000 candles, you get the following error message:

{ "code" : 36, "message" : "The value specified is not in the valid range: Resulting candle count is larger than maximum allowed: 5000", "moreInfo" : "http:\/\/developer.oanda.com\/docs\/v1\/troubleshooting\/#errors" }
This limit can be bypassed using the following script:
#!/bin/bash # OANDA 5000 candle limit bypasser # Vicente Oscar Mier Vela # <vomv1988@gmail.com> # Wed Aug 13 19:45:05 CDT 2014 # # Example: # # ./dl2.sh "Aug 3 21:00:00 GMT 2014" "Aug 8 21:00:00 GMT 2014" S5 # # This will output 3 element lists composed of # 1. A line containing the UNIX time of a particular currency price sample # 2. A line containing the closing bid price # 3. A line containing the closing ask price # # Remember that... # "No candles are published for intervals where there are no ticks. This will result # in gaps in between time periods." # http://developer.oanda.com/docs/v1/rates/#get-current-prices START=$1 START=`date -d "$START" +%s` END=$2 END=`date -d "$END" +%s` TIME=$3 case $TIME in S5 ) TIMES=5 ;; S10 ) TIMES=10 ;; S15 ) TIMES=15 ;; S30 ) TIMES=30 ;; M1 ) TIMES=60 ;; M2 ) TIMES=120 ;; M3 ) TIMES=180 ;; M5 ) TIMES=300 ;; M10 ) TIMES=600 ;; M15 ) TIMES=900 ;; M30 ) TIMES=1800 ;; H1 ) TIMES=3600 ;; H2 ) TIMES=7200 ;; H3 ) TIMES=10800 ;; H4 ) TIMES=14400 ;; esac LAPSE=`echo "$END-$START" | bc` CHUNKS=`echo "$LAPSE/(5000*$TIMES)" | bc` AUTH="Authorization: Bearer 12345678900987654321-abc34135acde13f13530" DFORM="X-Accept-Datetime-Format: UNIX" URL="https://api-fxpractice.oanda.com/v1/candles?instrument=EUR_USD&start=0&end=0&granularity=$TIME" DATESTART=$START >samples if test $CHUNKS -gt 0 ; then while test $CHUNKS -gt 0 ; do DATEEND=`echo "$DATESTART+(5000*$TIMES)" | bc` # echo START $DATESTART = `date -d "@$DATESTART"` # echo END $DATEEND = `date -d "@$DATEEND"` URLSAMPLE=`echo $URL | sed "s/start=0&end=0/start=$DATESTART\\&end=$DATEEND/"` curl -s -H "$AUTH" -H "$DFORM" -X GET $URLSAMPLE >> samples DATESTART=`echo "$DATEEND+$TIMES" | bc` CHUNKS=`echo "$CHUNKS-1" | bc` echo CHUNKS $CHUNKS done LASTCHUNKSIZE=`echo "$LAPSE%(5000*$TIMES)" | bc` DATESTART=`echo "$END-$LASTCHUNKSIZE+$TIMES" | bc` DATEEND=$END # echo START $DATESTART = `date -d "@$DATESTART"` # echo END $DATEEND = `date -d "@$DATEEND"` URLSAMPLE=`echo $URL | sed "s/start=0&end=0/start=$DATESTART\\&end=$DATEEND/"` curl -s -H "$AUTH" -H "$DFORM" -X GET $URLSAMPLE >> samples else DATESTART=$START DATEEND=$END # echo START $DATESTART = `date -d "@$DATESTART"` # echo END $DATEEND = `date -d "@$DATEEND"` URLSAMPLE=`echo $URL | sed "s/start=0&end=0/start=$DATESTART\\&end=$DATEEND/"` curl -s -H "$AUTH" -H "$DFORM" -X GET $URLSAMPLE >> samples fi cat samples | grep -e time -e closeBid -e closeAsk | sed 's/000000",/",/;s/"//;s/"//g;s/.*: //;s/,//'
I will pipe this script's output into a C program to produce the trading simulation I mentioned in an earlier post.

Starting with MACD using shell scripts and C

2014-08-13T11:10:00.001-07:00

I've been interested in FOREX trading since I enrolled in college around 8 years ago, reading pieces of articles about it here and there. I am now a 26 year old electronics engineer with some graduate school experience in computer science. I think I've matured enough, mathematically and intellectually, to start playing with the FOREX market.

I admit I have no formal or previous experience with trading. This blog is not intended as a guide of any kind. Much less as trading advice. This blog is about the development of my FOREX learning curve, from the very beginning. The notes contained here are for my personal use, but I will share them in case anybody out there wants to collaborate, make suggestions, corrections, criticism, or simply to learn alongside myself.

That said, I'll start by sharing some of the experiences I've collected so far.

I've started by becoming familiar with the basic FOREX trading concepts. I believe I have a good enough grasp on pips, leverage and the MACD to start playing around with an OANDA test account.

Recenly I've run some experiments using the OANDA HTTP API, UNIX shell scripts and C programming. So far I haven't been able to turn any profit using my MACD schemes with the OANDA API. I blame a set of factors for this:

The input parameters I've used are based on only a few trading simulations I ran using the EUR_USD price data of 3.5 days (5000 1-minute samples).
My simulations do not entierly represent real trading. I must improve them.
I am using MACD exclusively. I should try combining it with other methods, such as a 1-2-3 scheme and RSI.

I am aware that there exists software which already implements MACD and others. But I've refrained from using such things. Mostly because of my obsession with writing my own software, and with being in absolute control of what my software does. Still, I don't reject the idea of using pre-existing software. I think experimenting with pre-existing software may help me turning profit sooner. It may even help me improve my own software.

To solve the issues which have prevented me from turning a profit with my OANDA test account, I have formulated the following TODO list:

Writing a proper trading simulator.
Using such simulator to run a genetic algorithm which finds the optimal MACD parameters (genetic MACD tuning). These parameters include the typical 3 numeric values, and a fourth value representing the time frame for sampling closing bid price values.
Run my MACD shell script with the obtained values.
Copy this trading strategy: http://forex-strategies-revealed.com/simple/123-rsi-macd using only shell scripting and C programming, as to understand the nuts and bolts of MACD, 1-2-3 and RSI.
Copy the same trading strategy using the suggested pre-existing software packages.

Currently my source code is not ready for publishing. I will polish it and then share it on my github account. I will be reporting as I finish the elements in my TODO list.