FOREX | UNIX: February 2015

Sunday, February 22, 2015

My interpretation of Kuperin's paper on price forecasting

When I was trying to reproduce Kuperin et al's experiments:

http://link.springer.com/chapter/10.1007/3-540-47789-6_127
http://arxiv.org/abs/cond-mat/0304469

I produced the following work:

https://www.dropbox.com/s/doyi1lm7ugc7e7f/elmold.tar.gz?dl=0

The file is large because it contains the output data from several weeks of continuous processing. The programs are Elman network trainers based on various genetic algorithms. An animation of one of my experiments can be watched in:

https://www.youtube.com/watch?v=Yz1NYskvFg0

The results were disappointing, so I decided not to upload anything until now. I've uploaded this work only to keep a historical record.

A (crude) contribution to agent-based market simulation, and some ideas for the future

A while ago I wrote the skeleton for an agent-based market simulator. It's basically just the market-maker with an order book and a couple of unintelligent investors. The investors just place buy and sell orders randomly, and drive the price up or down.

/*
Market simulation
<vomv1988@gmail.com>
Sat Jan 31 13:03:54 CST 2015

Based on
http://www.21stcenturyinvestoreducation.com/page/tce/courses/course-101/004/001-stock-prices.html

I wrote the following market simulator in which the market maker takes a spread of $5 from a commodity which tan take a price from $1 to $100.

The investors place random limit orders, and the market maker decides the optimal price in order to guarantee the greatest possible liquidity.
*/

‪#‎include‬ <stdio.h>
#include <stdlib.h>

int iabs(int x);

int main(){
int **orderbook;
int price, buysell;

int **orders;
int id, ordprice, order;

int pricerange = 100;
orderbook = (int **) malloc(sizeof(int *) * pricerange);

int orderslen = 1000;
orders = (int **) malloc(sizeof(int *) * orderslen);

int i;
for(i = 0; i < pricerange; i ++)
orderbook[i] = malloc(sizeof(int) * 2);

for(i = 0; i < orderslen; i ++)
orders[i] = malloc(sizeof(int) * 3);

for(i = 0; i < orderslen; i ++){
id = i;
ordprice = rand() % pricerange + 1;
order = rand() % 11 - 5;
orders[i][0] = id;
orders[i][1] = ordprice;
orders[i][2] = order;
}

int j;
for(i = 0; i < pricerange; i ++){
price = i + 1;
orderbook[i][0] = 0;
orderbook[i][1] = 0;
for(j = 0; j < orderslen; j ++)
if(orders[j][1] == price)
if((buysell = orders[j][2]) < 0)
orderbook[i][0] += buysell;
else if((buysell = orders[j][2]) > 0)
orderbook[i][1] += buysell;
}
/*
for(i = 0; i < orderslen; i ++)
printf(
"ID: %d, PRICE: %d, ORDER: %d\n",
orders[i][0],
orders[i][1],
orders[i][2]
);

for(i = 0; i < pricerange; i ++)
printf(
"PRICE: %d, SELLERS: %d, BUYERS: %d\n",
i + 1,
orderbook[i][0],
orderbook[i][1]
);
*/

int totalorders = 0;

int marketprice = 20;
int spread = 5;
int sold = 0;
int bought = 0;
int transactions;
int bestprice = 0;
int bestorders = totalorders;

int ticks = 1000;

int k;

for(j = 0; j < ticks; j ++){
totalorders = 0;
for(i = 0; i < pricerange; i ++)
totalorders += -1 * orderbook[i][0] + orderbook[i][1];

bestorders = totalorders;
bestprice = 0;

for(marketprice = 1; marketprice <= pricerange; marketprice ++){
sold = bought = 0;
for(i = 0; i < pricerange; i ++){
price = i + 1;
if(price <= marketprice + spread)
bought += orderbook[i][0];
if(price >= marketprice)
sold += orderbook[i][1];
}
transactions = bought + sold;

if(iabs(transactions) < iabs(bestorders)){
bestorders = transactions;
bestprice = marketprice;
}
/*
printf(
"MARKETPRICE: %d, BOUGHT: %d, SOLD: %d, TRANSACTIONS: %d\n",
marketprice, bought, sold, transactions
);
*/
}

printf("BESTPRICE: %d, BESTTRANS: %d\n", bestprice, bestorders);

// printf("EXECUTE ORDERS\n");
for(i = 0; i < pricerange; i ++){
price = i + 1;
if(price <= bestprice)
orderbook[i][0] = 0;
if(price >= bestprice + spread)
orderbook[i][1] = 0;
}
/*
for(i = 0; i < pricerange; i ++)
printf(
"PRICE: %d, SELLERS: %d, BUYERS: %d\n",
i + 1,
orderbook[i][0],
orderbook[i][1]
);
*/
// printf("TAKE NEW ORDERS\n");
for(i = 0; i < orderslen; i ++){
id = i;
ordprice = rand() % pricerange + 1;
order = rand() % 11 - 5;
orders[i][0] = id;
orders[i][1] = ordprice;
orders[i][2] = order;
}

for(i = 0; i < pricerange; i ++){
price = i + 1;
orderbook[i][0] = 0;
orderbook[i][1] = 0;
for(k = 0; k < orderslen; k ++)
if(orders[k][1] == price)
if((buysell = orders[k][2]) < 0 && price >= bestprice)
orderbook[i][0] += buysell;
else if((buysell = orders[k][2]) > 0 && price <= bestprice + spread)
orderbook[i][1] += buysell;
}
/*
for(i = 0; i < pricerange; i ++){
price = i + 1;
if(price >= bestprice)
orderbook[i][0] += -1 * (rand() % 10 + 1);
if(price <= bestprice + spread)
orderbook[i][1] += rand() % 10 + 1;
}

for(i = 0; i < pricerange; i ++)
printf(
"PRICE: %d, SELLERS: %d, BUYERS: %d\n",
i + 1,
orderbook[i][0],
orderbook[i][1]
);
*/
}

for(i = 0; i < pricerange; i ++)
free(orderbook[i]);

for(i = 0; i < orderslen; i ++)
free(orders[i]);

free(orderbook);
free(orders);

return 0;
}

int iabs(int x){
if(x < 0)
return -1 * x;
else
return x;
}

Here's an example of price fluctuations produced by this code:

The following tarball contains additional code examples. In some source files, the price volatility can be regulated by setting a range for order volumes. In others, there's an example of front-running price prediction.

https://www.dropbox.com/s/g01i5qntffrekpb/simcom.tar.gz?dl=0

I would have to endow the investors in the simulation with some degree of intelligence in order to make it more realistic. But even if I followed the ideas in http://jasss.soc.surrey.ac.uk/6/3/3.html, there would be some fundamental flaws.

The agent-based simulations I've seen so far seem to only take into consideration past price fluctuations to determine artificial investors' buy/sell decisions. As if, in real life, real investors only took these things into considerations in their decision to buy and sell. In reality, investors also read the news. Some may even base their investment on irrational decisions, such as fortune-telling, the zodiac, or "gut-feelings".

A better model would look something like this:

Factors colored in red are ones I haven't seen taken into account in the studies I've read so far. They also seem extremely hard to predict. I think predicting the market requires predicting things such as wold events or human history. A natural catastrophe can drive stock and currency prices down, as can a civil war.

I should add that, at this point, I'm starting to think that successful investors are more the product of chance and luck, than the product of "intelligent investing". I'm extremely skeptical a to whether a thing such as "intelligent investment" even exists, in the traditional market sense.

This also makes me think that the attempts of so-called "economic analysts" to predict the behavior of markets, is also futile. They may think they're doing science, but I think they're far from it. I think economics is in a proto-scientific stage as of now. I see no near future breakthrus in market forecasting. I'm not saying it's hopless. Just very misinformed, and young. There may be a scientific future to market forecasting, but in the far, far future.

In a practical sense, to me, this means that economics is a field much younger than it's professionals might make it seem. At least scientifically. It sounds like an exciting field to try to do science in. But I should expect no significant advances as of today. There is still too much to be found out about human behavior, in order to make economics a viable science.

I will probably be taking a break from this line of research. Perhaps, never to return. It's been a pleasant experience nevertheless.

Cleaner code and some interesting links

Here's some new code I wrote. I think this is the last code I'll write for trying to predict forex trends using neural networks. It's useless for predicting forex trends, but it has some cool examples of simple recurrent neural networks in C.

https://www.dropbox.com/s/b5cxga1sthtqzds/predict.tar.gz?dl=0

By the way, the time series I used for the experiments in the tarball was taken from

http://www.ecb.europa.eu/stats/eurofxref/eurofxref-hist.xml

And here are some links with ideas related to what I mentioned in my last post about considering markets as complex systems composed of simpler, smaller parts. I've to say I do consider the examples in the links to be gross oversimplifications. But so do the authors, apparently.

http://www.altreva.com/
http://jasss.soc.surrey.ac.uk/6/3/3.html
http://pages.stern.nyu.edu/~adamodar/New_Home_Page/invfables/investorirrationality.htm

Monday, February 16, 2015

Some conclusions about the behavior of markets

After a couple of months of attempting to apply my interpretation of Kuperin et al's method to the prediction of forex market trends with neural networks, I've come to a few of realizations.

I've concluded that neural networks are, in fact, insufficient for predicting market trends. I've also come to believe that technical analysis is, as a whole, just a pseudoscience. I've come to believe these things because I've noticed several recurrent flaws in the studies that presume to demonstrate the efficacy of these forecasting methods.

Take neural networks, for example. The way you test the efficacy of a neural network in predicting market trends is by taking a series of steps:

1. First, you have to choose a network architecture.
2. Then, you acquire some large dataset to train and test the neural network you picked.
3. You train your neural network using a section of this dataset.
4. You "tell" the neural network to make a prediction, and see how close the network's prediction is to the actual data, taken from the same dataset.

One problem lies in the fact that, in step 1, you can choose from an enormous set of different types of neural networks. There are infinitely many different neural networks to choose from, and some of them will perform better than others. Does finding one which produces statistically significant results for a specific dataset really mean anything?

If you choose some neural network architecture, set it's number of neurons, you tune the sigmoid function for it's neurons, and then run your tests, and it turns out that the quality of the predictions is unimpressive, what do you do? You just change the parameters, and try again. You try a different architecture, different number of neurons, alter the sigmoid, and let's say that, again, you find nothing interesting. So you keep changing the parameters until you come up with a neural network that seems to work.

But what's the actual extent of that neural network's efficacy? Maybe it only works for the dataset you're using. Maybe if you used the data of 10 years from now, the same neural network would produce bad predictions.

So you ask yourself these questions, and you decide to make another test. You train and test your neural network using data from a specific period, say, the usd/euro fluctuations from year 2006. And then you test the same network architecture, by training and testing it using data from 2007, 2008, 2009...

But even if you find a neural network that works along that whole time line, wouldn't you be falling in the same survivorship bias as before? That is: it makes no difference whether you test your network with 2006 data first, and then with 2007, 2008, 2009... data, if you're going to throw it away if it turns out that it doesn't work along the whole time-line. That's basically the same thing you did before! It seems to me that this would just be cherry-picking the "best" neural network, out of the infinite set of possible different neural networks. There's still no guarantee that, 10 years from now, that same network you picked will produce good predictions.

This is no way to do science.

In science, you should have some degree of control over the "independent variables" which affect the outcome you're attempting to predict. Technical analysis assumes that future fluctuations can be predicted, to a degree, from the information of past fluctuations in price. From technical analysis, you could consider the "independent variables" to be a set of past price fluctuations. And the "dependent variable" would be the future fluctuations, which you are trying to predict.

The assumption is that there exists some mathematical formula which takes in it's input a set of past fluctuations, and spits to it's output a set of future fluctuations. An example of this is the MACD indicator. But MACD suffers from the same survivorship bias problem I laid out before. If you intend to use MACD to predict price fluctuations, you should know that MACD uses a set of parameters. The MACD is actually "tunable". So, if your MACD starts to fail, you can always tune it to "fix it". But then, does the MACD really work at all? Or are you just fooling yourself?

Tuning the MACD, or a neural network architecture for that matter, to fit the data, is the same as changing your theory every time you're confronted with contradicting results. What if there's really no mathematical relation between the past of price fluctuations, and the future of them? I mean, you keep scrambling your formula to try to come up with one that produces good predictions. But if the fluctuations are random, your formula may just be some locally optimal formula that works for your particular dataset. Again: no guarantee that it'll work for future datasets.

You really have no control over the behavior of markets in order to do the kinds of experiments required to extract underlying principles for their behavior. You may think that using different datasets to test your formula is a sort of "control" over the input parameters for the formula. That is: you can use a different set of inputs and outputs, to see if your formula works there too. But you're not controlling the underlying causes that made the data be the way it is. You're just hoping to find some periodicity in a time series. This is not at all the same as stimulating a system (i.e., the market) to find causal relationships.

Maybe there IS a formula that can predict market fluctuations. But I think the correct way to construct this formula, is not by scrambling the weights in a neural network, or the parameters in a MACD, or any other technical analysis indicator. The correct formula should be based on principles extracted from the behavior of economic agents. From an accurate and tested model of the behavior of investors.

I think the correct approach would be to think of markets in the way we think about the weather. People who study the behavior of weather apply the principles of the dynamics of fluids to the complex system that is the weather. That is: principles which have been extracted from controlled lab experiments, where we can actually control the independent variables which affect the physical behaviors of simple components of the weather, such as the air, or water.

The way to construct a correct mathematical model for the behavior of markets would be to extract the principles which govern their simplest components. In the case of markets, the components would be behavior of individual investors, the behavior of companies, and societies, and even the behavior of natural catastrophes or events which might influence an investor's decision to buy or sell. The market is, in fact, an enormously complex system, which involves all these things, and maybe more.

Using something like a neural network with 200 neurons to try to predict the fluctuations produced by such a beast seems to me like a long shot. The correct model would have to be, from my point of view, in all likelihood, much, much more complex than that. Not to mention that the very methodology of using past fluctuation data to try to predict future fluctuations is just wrong. Trying to find periodicity is not the same as modeling.

An example of causal relationships in markets can be learned, for example, from "front running". Front running is a kind of market fraud that takes advantage of a particular market inefficiency which guarantees a rise or fall in price. So people who invest based on front-running actually KNOW that the price is going to fall or rise. The fact that it's documented as a (mostly) certain way of predicting the market which, because of this, is illegal, is proof that there exist basic principles which govern the markets, and allow people to make accurate price predictions. I'm not saying that we should start a front-running scheme. I'm just saying that the underlying principles exist, and they've even been documented and classified as "cheating". Maybe it's possible to extract some other principles, through observation or experiments, in order to produce a better predictive model for price fluctuations.