Python

Search

Read large SAS file ilarger than memory n Python

import pandas as pd
import pyreadstat
filename = 'foo.SAS7BDAT'
CHUNKSIZE = 50000
offset = 0
allChunk,_ = getChunk(row['filePath'], row_limit=CHUNKSIZE, row_offset=offset)
allChunk = allChunk.astype('category')

while True:
    offset += CHUNKSIZE
    # for xpt data, use pyreadstat.read_xpt()
    chunk, _ = pyreadstat.read_sas7bdat(filename, row_limit=CHUNKSIZE, row_offset=offset)
    if chunk.empty: break  # if chunk is empty, it means the entire data has been read, so break

    for eachCol in chunk:  #converting each column to categorical 
        colUnion = pd.api.types.union_categoricals([allChunk[eachCol], chunk[eachCol]])
        allChunk[eachCol] = pd.Categorical(allChunk[eachCol], categories=colUnion.categories)
        chunk[eachCol] = pd.Categorical(chunk[eachCol], categories=colUnion.categories)

    allChunk = pd.concat([allChunk, chunk])  #Append each chunk to the resulting dataframe

Comment

PREVIOUS	NEXT

Code Example
Python :: python selenium for desktop application
Python :: list in pythom
Python :: get all methods of an instance
Python :: copy element dynamo revit
Python :: discord.py get user input (simplified)
Python :: como filtrar los vacios, NaN, null en python
Python :: qtoverlay
Python :: import sys locate python = sys.exec_prefix print(locate python)
Python :: codeforces 233 a solution python
Python :: python counting subfolders on specific level
Python :: mechanize python #12
Python :: how to get data from multiple tables in django
Python :: take substring of every element in dataframe
Python :: how to get the original start_url in scrapy
Python :: torch remove part of array
Python :: Run flask on docker with postgres and guinicorn
Python :: .format() multiple placeholders
Python :: # filter a list
Python :: how to make a new df from old
Python :: get ggplot colorpalette python
Python :: import image files from folders
Python :: Command to import Required, All, Length, and Range from voluptuous
Python :: foreach on sysargv
Python :: Using *args to pass the variable-length arguments to the function
Python :: Algorithms and Data Structures in Python (INTERVIEW Q&A)
Python :: find all html files in a current directory using regular expression in python
Python :: how to print hello world in python stack overflow
Python :: how to change the color of console output in python to green
Python :: codeforces problem 200B
Python :: TemplateDoesNotExist at /

Search

PYTHON

Read large SAS file ilarger than memory n Python

ADD CONTENT