Usage

Example using csv_loader

from datetime import datetime
from pathlib import Path

import pandas as pd

from fast_csv_loader import csv_loader

file = Path("path/to/your/timeseries_data.csv")

# Load 160 lines of data from end of file
df = csv_loader(file_path=file)

print(df)
# Load last 500 lines of data upto 10th May 2024
df = csv_loader(
    file_path=file,
    period=500,
    end_date=datetime(2024, 5, 10),
)

# If `Date` is not default datetime column, specify it using `date_column`
df = csv_loader(file_path=file, date_column="time")

# If pandas is unable to parse the date column, specify the format using `date_format`
df = csv_loader(file_path=file, date_format="%d/%m/%Y")

# Increase the `chunk_size` to optimize performance based of your specific needs.
df = csv_loader(file_path=file, chunk_size=1024 * 10)

Example using cached_csv_loader

Note

cached_csv_loader was introduced in version 2.2.0

from fast_csv_loader import (
    cached_csv_loader,
    invalidate, invalidate_all,
    cache_stats, set_max_cache_size,
)

df = cached_csv_loader(Path("AAPL.csv"), period=200)   # disk read
df = cached_csv_loader(Path("AAPL.csv"), period=200)   # cache hit

# Auto-invalidates when file mtime changes (e.g. after EOD sync overwrites)
# For explicit invalidation:
invalidate("AAPL.csv")
invalidate_all()

# Observability:
cache_stats()
# {'hits': 49, 'misses': 1, 'evictions': 0, 'size': 1, 'hit_rate': 98.0, 'max_size': 500}

API

fast_csv_loader.csv_loader(file_path: Path, period: int = 160, end_date: datetime | None = None, date_format: str | None = None, use_columns: List[str] | None = None, chunk_size: int = 6144) DataFrame

Load a CSV file with timeseries data in chunks from the end.

Could return an empty DataFrame, if no data was found. Use df.empty to check if the DataFrame is empty before further processing.

Parameters:
  • file_path (pathlib.Path) – The path to the CSV file to be loaded.

  • period (int) – Number of lines/candles to return. The default is 160.

  • end_date (Optional[datetime]) – Load N lines up to this date. If None, will load the last N lines from the file. If the date is provided, load the last N lines from this date.

  • date_format (Optional[str]) – Custom date format in case pandas is unable to parse the date column.

  • use_columns (Optional[List[str]]) – Default None. List of column names to load from the CSV file. If None, all columns are loaded.

  • chunk_size (int) – The size of data chunks loaded into memory. The default is 6144 bytes (6 KB).

Returns:

A DataFrame containing the loaded timeseries data.

Return type:

pd.DataFrame

Raises:

IndexError – if end_date is provided but not within the boundary of the data.

fast_csv_loader.cached_csv_loader(file_path: Path, period: int = 160, end_date: datetime | None = None, date_format: str | None = None, use_columns: List[str] | None = None, chunk_size: int = 6144) DataFrame

Added in version 2.2.0.

Mtime-aware cached wrapper around csv_loader.

Provides a drop-in replacement for csv_loader for cases where the same CSV file may be read multiple times within the same process. Results are cached based on file path, modification time, and selected query parameters.

The cache key is composed of (file_path, end_date, date_format, use_columns). The period parameter is NOT part of the cache key and is applied after cache retrieval, meaning different period values reuse the same cached DataFrame and only affect the returned slice.

If the underlying file has changed (based on mtime), the cache entry is invalidated and the file is reloaded.

Parameters:
  • file_path (pathlib.Path) – The path to the CSV file to be loaded.

  • period (int) – Number of rows/candles to return from the end of the dataset. Default is 160.

  • end_date (Optional[datetime]) – Load data up to this timestamp. If None, the most recent data is used. If provided, loading is anchored to this date.

  • date_format (Optional[str]) – Custom datetime format string used for parsing the CSV date column if automatic parsing fails.

  • use_columns (Optional[List[str]]) – List of column names to load from the CSV file. If None, all columns are loaded.

  • chunk_size (int) – Size of chunks (in bytes) used when reading the CSV file. Default is 6144 bytes (6 KB).

Returns:

A DataFrame containing the requested slice of timeseries data.

Return type:

pd.DataFrame

Raises:

FileNotFoundError – If file_path does not exist.

fast_csv_loader.invalidate(file_path) int

Added in version 2.2.0.

Drop all cache entries for a given file (any end_date / columns combination).

Useful after writing new data to disk when you want to ensure subsequent reads do not return stale cached results. Otherwise, cache entries are invalidated automatically based on file modification time.

Parameters:

file_path (pathlib.Path | str) – Path of the file whose cache entries should be removed.

Returns:

Number of cache entries removed for the given file.

Return type:

int

fast_csv_loader.invalidate_all() int

Added in version 2.2.0.

Drop all entries from the cache.

Useful for resetting cache state entirely, for example during testing or after bulk data updates.

Returns:

Number of cache entries removed.

Return type:

int

fast_csv_loader.cache_stats() dict

Added in version 2.2.0.

Return cache observability metrics including hit/miss counts, current cache size, and hit rate.

Returns:

Dictionary containing cache statistics:

  • hits: Number of cache hits

  • misses: Number of cache misses

  • evictions: Number of evicted entries

  • size: Current number of cached entries

  • hit_rate: Cache hit rate as a percentage (rounded to 1 decimal)

  • max_size: Maximum allowed cache size

Return type:

dict

fast_csv_loader.set_max_cache_size(n: int) None

Added in version 2.2.0.

Set the maximum number of cached entries allowed.

If the cache exceeds this size, older entries will be evicted automatically.

Parameters:

n (int) – New maximum cache size. Must be greater than 0.

Raises:

ValueError – If n is less than or equal to 0.