Algorithm¶
This is a broad overview of the csv_loader algorithm. To get more detailed understanding, please read the actual code.
To maximise efficiency, all operations are performed in binary.
Assume you wish to load the last 160 lines of the file.
Get the file size.
If the size if less than
chunk_size(or 19 kb whichever is higher),If
end_dateis provided:Filter the data upto the
end_dateand return the last 160 lines
Load the entire file and return the last 160 lines.
Read the first line of file to get the column header.
Seek to the end of the file.
Read the last N bytes (Chunk) of the file.
On the first chunk, get a count of line breaks (
\n) in the chunk to estimate lines per chunk.On every chunk,
Update the number of lines read by adding the lines per chunk.
Store the chunks in a list
If
end_dateis specified, we parse the first date string in the chunk to check if we’re past theend_dateIf yes, we continue, until desired number of lines have been loaded.
Once we have the desired number of lines,
Append the column header and final chunk.
Reverse the list and join it into a io.BytesIO object.
Load it into a Pandas DataFrame and return the slice of data required.