Documentation
Getting Started
Everything you need to load, explore, and backtest with your historical forex data.
File Structure
Each currency pair is provided as a separate Parquet file per timeframe. The naming convention is:
EURUSD_M1.parquet # 1-minute bars
EURUSD_M5.parquet # 5-minute bars
EURUSD_M15.parquet # 15-minute bars
EURUSD_H1.parquet # 1-hour bars
EURUSD_H4.parquet # 4-hour bars
EURUSD_D1.parquet # Daily bars
EURUSD_W1.parquet # Weekly barsColumns
| Column | Type | Description |
|---|---|---|
| timestamp | datetime64[ns] | UTC timestamp for the bar open |
| open | float64 | Opening price |
| high | float64 | Highest price during the bar |
| low | float64 | Lowest price during the bar |
| close | float64 | Closing price |
| volume | int64 | Tick volume (number of ticks) |
Python / Pandas
Parquet is the native format for pandas and loads instantly, even for large datasets.
Basic Loading
import pandas as pd
# Load a single pair
df = pd.read_parquet('EURUSD_M1.parquet')
# View the data
print(df.head())
print(f"Rows: {len(df):,}")
print(f"Date range: {df['timestamp'].min()} to {df['timestamp'].max()}")Loading Multiple Pairs
from pathlib import Path
# Load all major pairs
majors = ['EURUSD', 'GBPUSD', 'USDJPY', 'USDCHF', 'AUDUSD', 'USDCAD', 'NZDUSD', 'EURGBP']
data = {}
for pair in majors:
data[pair] = pd.read_parquet(f'{pair}_H1.parquet')
# Access individual pair
eurusd = data['EURUSD']Set Timestamp as Index
df = pd.read_parquet('EURUSD_H1.parquet')
df = df.set_index('timestamp')
# Now you can slice by date
df_2023 = df['2023']
df_q1 = df['2023-01':'2023-03']Resample to Different Timeframes
# Load M1 data and resample to H4
df = pd.read_parquet('EURUSD_M1.parquet')
df = df.set_index('timestamp')
df_h4 = df.resample('4H').agg({
'open': 'first',
'high': 'max',
'low': 'min',
'close': 'last',
'volume': 'sum'
}).dropna()R
Use the arrow package to read Parquet files in R.
# Install arrow package (first time only)
install.packages("arrow")
library(arrow)
# Load data
df <- read_parquet("EURUSD_H1.parquet")
# View structure
str(df)
head(df)
# Convert timestamp to POSIXct
df$timestamp <- as.POSIXct(df$timestamp, origin = "1970-01-01", tz = "UTC")With tidyverse
library(arrow)
library(dplyr)
library(lubridate)
df <- read_parquet("EURUSD_D1.parquet") %>%
mutate(
timestamp = as_datetime(timestamp),
year = year(timestamp),
returns = (close - lag(close)) / lag(close)
)
# Annual summary
df %>%
group_by(year) %>%
summarize(
total_return = prod(1 + returns, na.rm = TRUE) - 1,
volatility = sd(returns, na.rm = TRUE) * sqrt(252)
)MT4 / MT5
If you purchased the MT4/MT5 format add-on, you'll receive .hst files ready for import into MetaTrader.
MetaTrader 4
- Increase max bars: Go to
Tools → Options → Charts. Set "Max bars in history" and "Max bars in chart" to2147483647 - Delete existing data (recommended): Go to
File → Open Data Folder → history → [server-name]. Delete existing.hstfiles for pairs you want to import - Import: Open History Center with
Tools → History Centeror press F2 - Select the symbol and timeframe, click Import, then browse to your
.hstfile - Restart MT4 to see the data on charts
MetaTrader 5
MT5 cannot overwrite existing symbol data. You must create a custom symbol:
- Set unlimited bars: Go to
Tools → Options → Charts. Set "Max bars in chart" to Unlimited, then restart MT5 - Create custom symbol: Go to
View → Symbols(or press Ctrl+U). Click Create Custom Symbol and name it (e.g., "EURUSD_HIST") - Import: Select your new symbol, go to the Bars tab, click Import Bars, and select your data file
- Your custom symbol will appear under the "Custom" folder in Market Watch
CSV / Excel
If you purchased the CSV format add-on, you can open the files directly in Excel or any spreadsheet application.
Excel
- Open Excel
- Go to Data → From Text/CSV
- Select the CSV file
- Excel will auto-detect the delimiter and columns
Convert Parquet to CSV (Python)
import pandas as pd
# Convert to CSV
df = pd.read_parquet('EURUSD_D1.parquet')
df.to_csv('EURUSD_D1.csv', index=False)
# Or filter first to reduce size
df_2023 = df[df['timestamp'].dt.year == 2023]
df_2023.to_csv('EURUSD_D1_2023.csv', index=False)Backtesting Examples
Common patterns for running backtests with the historical data.
Simple Moving Average Crossover
import pandas as pd
import numpy as np
df = pd.read_parquet('EURUSD_H1.parquet')
df = df.set_index('timestamp')
# Calculate moving averages
df['sma_20'] = df['close'].rolling(20).mean()
df['sma_50'] = df['close'].rolling(50).mean()
# Generate signals
df['signal'] = np.where(df['sma_20'] > df['sma_50'], 1, -1)
df['signal'] = df['signal'].shift(1) # Avoid lookahead bias
# Calculate returns
df['returns'] = df['close'].pct_change()
df['strategy_returns'] = df['signal'] * df['returns']
# Performance
total_return = (1 + df['strategy_returns']).prod() - 1
print(f"Strategy Return: {total_return:.2%}")With Backtesting.py
from backtesting import Backtest, Strategy
from backtesting.lib import crossover
import pandas as pd
class SmaCross(Strategy):
n1 = 20
n2 = 50
def init(self):
self.sma1 = self.I(lambda x: pd.Series(x).rolling(self.n1).mean(), self.data.Close)
self.sma2 = self.I(lambda x: pd.Series(x).rolling(self.n2).mean(), self.data.Close)
def next(self):
if crossover(self.sma1, self.sma2):
self.buy()
elif crossover(self.sma2, self.sma1):
self.sell()
# Load data
df = pd.read_parquet('EURUSD_H1.parquet')
df = df.set_index('timestamp')
df.columns = ['Open', 'High', 'Low', 'Close', 'Volume']
bt = Backtest(df, SmaCross, cash=10000, commission=.0002)
stats = bt.run()
print(stats)Multi-Pair Analysis
import pandas as pd
pairs = ['EURUSD', 'GBPUSD', 'USDJPY', 'AUDUSD']
results = {}
for pair in pairs:
df = pd.read_parquet(f'{pair}_D1.parquet')
df = df.set_index('timestamp')
# Calculate daily returns
df['returns'] = df['close'].pct_change()
results[pair] = {
'mean_return': df['returns'].mean() * 252, # Annualized
'volatility': df['returns'].std() * np.sqrt(252),
'sharpe': (df['returns'].mean() / df['returns'].std()) * np.sqrt(252)
}
pd.DataFrame(results).TAvailable Timeframes
| Code | Timeframe | Bars per Day | ~Rows (25 years) |
|---|---|---|---|
| M1 | 1 Minute | 1,440 | ~9.4 million |
| M5 | 5 Minutes | 288 | ~1.9 million |
| M15 | 15 Minutes | 96 | ~624,000 |
| H1 | 1 Hour | 24 | ~156,000 |
| H4 | 4 Hours | 6 | ~39,000 |
| D1 | Daily | 1 | ~6,500 |
| W1 | Weekly | 0.2 | ~1,300 |
FAQ
What timezone are the timestamps in?
All timestamps are in UTC. This is the standard for forex data and avoids daylight saving time issues.
Are weekend gaps included?
No. The data runs from Sunday ~22:00 UTC to Friday ~22:00 UTC each week. Weekend periods with no trading are excluded.
What is tick volume?
Tick volume counts the number of price changes (ticks) during each bar. It's a proxy for activity/liquidity, though not actual traded volume (forex is decentralized).
Why Parquet instead of CSV?
Parquet is 5-10x smaller than CSV and loads 10-100x faster. A 2GB CSV becomes a 200MB Parquet file that loads in seconds. It's the standard format for data science workflows.
How do I convert to CSV?
See the CSV section above. One line of Python: df.to_csv('file.csv')
Is bid/ask spread included?
The data contains mid prices (average of bid and ask). For backtesting, add realistic spread costs: ~0.5-1 pip for majors, 1-3 pips for minors/exotics.
Can I use this for machine learning?
Yes. The Parquet format works natively with pandas, scikit-learn, PyTorch, and TensorFlow. Many quants use this data for training price prediction and regime detection models.