Documentation

Getting Started

Everything you need to load, explore, and backtest with your historical forex data.

File Structure

Each currency pair is provided as a separate Parquet file per timeframe. The naming convention is:

EURUSD_M1.parquet    # 1-minute bars
EURUSD_M5.parquet    # 5-minute bars
EURUSD_M15.parquet   # 15-minute bars
EURUSD_H1.parquet    # 1-hour bars
EURUSD_H4.parquet    # 4-hour bars
EURUSD_D1.parquet    # Daily bars
EURUSD_W1.parquet    # Weekly bars

Columns

ColumnTypeDescription
timestampdatetime64[ns]UTC timestamp for the bar open
openfloat64Opening price
highfloat64Highest price during the bar
lowfloat64Lowest price during the bar
closefloat64Closing price
volumeint64Tick volume (number of ticks)
Note: All timestamps are in UTC. Weekend gaps (Friday close to Sunday open) are excluded.

Python / Pandas

Parquet is the native format for pandas and loads instantly, even for large datasets.

Basic Loading

python
import pandas as pd

# Load a single pair
df = pd.read_parquet('EURUSD_M1.parquet')

# View the data
print(df.head())
print(f"Rows: {len(df):,}")
print(f"Date range: {df['timestamp'].min()} to {df['timestamp'].max()}")

Loading Multiple Pairs

python
from pathlib import Path

# Load all major pairs
majors = ['EURUSD', 'GBPUSD', 'USDJPY', 'USDCHF', 'AUDUSD', 'USDCAD', 'NZDUSD', 'EURGBP']
data = {}

for pair in majors:
    data[pair] = pd.read_parquet(f'{pair}_H1.parquet')

# Access individual pair
eurusd = data['EURUSD']

Set Timestamp as Index

python
df = pd.read_parquet('EURUSD_H1.parquet')
df = df.set_index('timestamp')

# Now you can slice by date
df_2023 = df['2023']
df_q1 = df['2023-01':'2023-03']

Resample to Different Timeframes

python
# Load M1 data and resample to H4
df = pd.read_parquet('EURUSD_M1.parquet')
df = df.set_index('timestamp')

df_h4 = df.resample('4H').agg({
    'open': 'first',
    'high': 'max',
    'low': 'min',
    'close': 'last',
    'volume': 'sum'
}).dropna()

R

Use the arrow package to read Parquet files in R.

R
# Install arrow package (first time only)
install.packages("arrow")

library(arrow)

# Load data
df <- read_parquet("EURUSD_H1.parquet")

# View structure
str(df)
head(df)

# Convert timestamp to POSIXct
df$timestamp <- as.POSIXct(df$timestamp, origin = "1970-01-01", tz = "UTC")

With tidyverse

R
library(arrow)
library(dplyr)
library(lubridate)

df <- read_parquet("EURUSD_D1.parquet") %>%
  mutate(
    timestamp = as_datetime(timestamp),
    year = year(timestamp),
    returns = (close - lag(close)) / lag(close)
  )

# Annual summary
df %>%
  group_by(year) %>%
  summarize(
    total_return = prod(1 + returns, na.rm = TRUE) - 1,
    volatility = sd(returns, na.rm = TRUE) * sqrt(252)
  )

MT4 / MT5

If you purchased the MT4/MT5 format add-on, you'll receive .hst files ready for import into MetaTrader.

MetaTrader 4

  1. Increase max bars: Go to Tools → Options → Charts. Set "Max bars in history" and "Max bars in chart" to 2147483647
  2. Delete existing data (recommended): Go to File → Open Data Folder → history → [server-name]. Delete existing .hst files for pairs you want to import
  3. Import: Open History Center with Tools → History Center or press F2
  4. Select the symbol and timeframe, click Import, then browse to your .hst file
  5. Restart MT4 to see the data on charts

MetaTrader 5

MT5 cannot overwrite existing symbol data. You must create a custom symbol:

  1. Set unlimited bars: Go to Tools → Options → Charts. Set "Max bars in chart" to Unlimited, then restart MT5
  2. Create custom symbol: Go to View → Symbols (or press Ctrl+U). Click Create Custom Symbol and name it (e.g., "EURUSD_HIST")
  3. Import: Select your new symbol, go to the Bars tab, click Import Bars, and select your data file
  4. Your custom symbol will appear under the "Custom" folder in Market Watch
Tip: After import, use the Strategy Tester's "Every tick based on real ticks" mode (MT5) or "Every tick" mode (MT4) for the most accurate backtests.

CSV / Excel

If you purchased the CSV format add-on, you can open the files directly in Excel or any spreadsheet application.

Excel

  1. Open Excel
  2. Go to Data → From Text/CSV
  3. Select the CSV file
  4. Excel will auto-detect the delimiter and columns
Note: M1 files contain millions of rows. Excel has a 1,048,576 row limit. For M1 data, use Python or load specific date ranges.

Convert Parquet to CSV (Python)

python
import pandas as pd

# Convert to CSV
df = pd.read_parquet('EURUSD_D1.parquet')
df.to_csv('EURUSD_D1.csv', index=False)

# Or filter first to reduce size
df_2023 = df[df['timestamp'].dt.year == 2023]
df_2023.to_csv('EURUSD_D1_2023.csv', index=False)

Backtesting Examples

Common patterns for running backtests with the historical data.

Simple Moving Average Crossover

python
import pandas as pd
import numpy as np

df = pd.read_parquet('EURUSD_H1.parquet')
df = df.set_index('timestamp')

# Calculate moving averages
df['sma_20'] = df['close'].rolling(20).mean()
df['sma_50'] = df['close'].rolling(50).mean()

# Generate signals
df['signal'] = np.where(df['sma_20'] > df['sma_50'], 1, -1)
df['signal'] = df['signal'].shift(1)  # Avoid lookahead bias

# Calculate returns
df['returns'] = df['close'].pct_change()
df['strategy_returns'] = df['signal'] * df['returns']

# Performance
total_return = (1 + df['strategy_returns']).prod() - 1
print(f"Strategy Return: {total_return:.2%}")

With Backtesting.py

python
from backtesting import Backtest, Strategy
from backtesting.lib import crossover
import pandas as pd

class SmaCross(Strategy):
    n1 = 20
    n2 = 50

    def init(self):
        self.sma1 = self.I(lambda x: pd.Series(x).rolling(self.n1).mean(), self.data.Close)
        self.sma2 = self.I(lambda x: pd.Series(x).rolling(self.n2).mean(), self.data.Close)

    def next(self):
        if crossover(self.sma1, self.sma2):
            self.buy()
        elif crossover(self.sma2, self.sma1):
            self.sell()

# Load data
df = pd.read_parquet('EURUSD_H1.parquet')
df = df.set_index('timestamp')
df.columns = ['Open', 'High', 'Low', 'Close', 'Volume']

bt = Backtest(df, SmaCross, cash=10000, commission=.0002)
stats = bt.run()
print(stats)

Multi-Pair Analysis

python
import pandas as pd

pairs = ['EURUSD', 'GBPUSD', 'USDJPY', 'AUDUSD']
results = {}

for pair in pairs:
    df = pd.read_parquet(f'{pair}_D1.parquet')
    df = df.set_index('timestamp')

    # Calculate daily returns
    df['returns'] = df['close'].pct_change()

    results[pair] = {
        'mean_return': df['returns'].mean() * 252,  # Annualized
        'volatility': df['returns'].std() * np.sqrt(252),
        'sharpe': (df['returns'].mean() / df['returns'].std()) * np.sqrt(252)
    }

pd.DataFrame(results).T

Available Timeframes

CodeTimeframeBars per Day~Rows (25 years)
M11 Minute1,440~9.4 million
M55 Minutes288~1.9 million
M1515 Minutes96~624,000
H11 Hour24~156,000
H44 Hours6~39,000
D1Daily1~6,500
W1Weekly0.2~1,300

FAQ

What timezone are the timestamps in?

All timestamps are in UTC. This is the standard for forex data and avoids daylight saving time issues.

Are weekend gaps included?

No. The data runs from Sunday ~22:00 UTC to Friday ~22:00 UTC each week. Weekend periods with no trading are excluded.

What is tick volume?

Tick volume counts the number of price changes (ticks) during each bar. It's a proxy for activity/liquidity, though not actual traded volume (forex is decentralized).

Why Parquet instead of CSV?

Parquet is 5-10x smaller than CSV and loads 10-100x faster. A 2GB CSV becomes a 200MB Parquet file that loads in seconds. It's the standard format for data science workflows.

How do I convert to CSV?

See the CSV section above. One line of Python: df.to_csv('file.csv')

Is bid/ask spread included?

The data contains mid prices (average of bid and ask). For backtesting, add realistic spread costs: ~0.5-1 pip for majors, 1-3 pips for minors/exotics.

Can I use this for machine learning?

Yes. The Parquet format works natively with pandas, scikit-learn, PyTorch, and TensorFlow. Many quants use this data for training price prediction and regime detection models.

Still have questions?

We're here to help you get the most out of your data.

Contact Support