Installation
To install arm-preprocessing with pip, use:
pip install arm-preprocessing
To install arm-preprocessing on Alpine Linux, use:
$ apk add py3-arm-preprocessing
To install arm-preprocessing on Arch Linux, use an AUR helper:
$ yay -Syyu python-arm-preprocessing
Usage
This section demonstrates the usage of the arm-preprocessing framework.
Data loading
The following examples demonstrate how to load a dataset from a file (csv, json, txt, tcx).
from arm_preprocessing.dataset import Dataset
# Initialise dataset with filename and format
dataset = Dataset('datasets/breast', format='csv')
# Load dataset
dataset.load()
# Print dataset information (columns, categories, min/max values, etc.)
dataset.dataset_statistics()
from arm_preprocessing.dataset import Dataset
# Initialise dataset with filename and format
dataset = Dataset('datasets/artm_test_dataset', format='json')
# Load dataset
dataset.load()
# Print dataset information (columns, categories, min/max values, etc.)
dataset.dataset_statistics()
from arm_preprocessing.dataset import Dataset
# Initialise dataset with filename, format, and datetime columns
dataset = Dataset('datasets/measures2', format='txt',
datetime_columns=['date', 'time'])
# Load dataset
dataset.load()
# Print dataset information (columns, categories, min/max values, etc.)
dataset.dataset_statistics()
from arm_preprocessing.dataset import Dataset
# Initialise dataset with path to TCX directory and format
dataset = Dataset('datasets/tcx', format='tcx')
# Load dataset
dataset.load()
# Print dataset information (columns, categories, min/max values, etc.)
dataset.dataset_statistics()
Missing values
The following examples demonstrate how to handle missing values in a dataset.
from arm_preprocessing.dataset import Dataset
# Initialise dataset with filename and format
dataset = Dataset('examples/missing_values/data', format='csv')
# Load dataset
dataset.load()
# Remove columns with missing data
dataset.missing_values(method='column')
from arm_preprocessing.dataset import Dataset
# Initialise dataset with filename and format
dataset = Dataset('examples/missing_values/data', format='csv')
# Load dataset
dataset.load()
# Remove rows with missing data
dataset.missing_values(method='row')
from arm_preprocessing.dataset import Dataset
# Initialise dataset with filename and format
dataset = Dataset('examples/missing_values/data', format='csv')
# Load dataset
dataset.load()
# Impute missing data
dataset.missing_values(method='impute')
Data discretisation
The following examples demonstrate how to discretise a dataset.
from arm_preprocessing.dataset import Dataset
# Initialise dataset with filename and format
dataset = Dataset('datasets/sportydatagen', format='csv')
# Load dataset
dataset.load()
# Discretise dataset using equal width discretisation
dataset.discretise(method='equal_width', num_bins=5, columns=['calories'])
from arm_preprocessing.dataset import Dataset
# Initialise dataset with filename and format
dataset = Dataset('datasets/measures2', format='txt',
datetime_columns=['date', 'time'])
# Load dataset
dataset.load()
# Discretise dataset using equal width discretisation
dataset.discretise(method='equal_frequency',
num_bins=3, columns=['temperature'])
from arm_preprocessing.dataset import Dataset
# Initialise dataset with filename and format
dataset = Dataset('datasets/measures2', format='txt',
datetime_columns=['date', 'time'])
# Load dataset
dataset.load()
# Discretise dataset using equal width discretisation
dataset.discretise(method='kmeans',
num_bins=5, columns=['temperature'])
Data squashing
The following examples demonstrate how to squash a dataset.
from arm_preprocessing.dataset import Dataset
# Initialise dataset with filename and format
dataset = Dataset('datasets/breast', format='csv')
# Load dataset
dataset.load()
# Squash dataset
dataset.squash(threshold=0.75, similarity='euclidean')
from arm_preprocessing.dataset import Dataset
# Initialise dataset with filename and format
dataset = Dataset('datasets/Abalone', format='csv')
# Load dataset
dataset.load()
# Drop "Sex" column from dataset.data
dataset.data.drop('Sex', axis=1, inplace=True)
# Squash dataset
dataset.squash(threshold=0.99, similarity='cosine')
Feature scaling
The following examples demonstrate how to scale a dataset.
from arm_preprocessing.dataset import Dataset
# Initialise dataset with filename and format
dataset = Dataset('datasets/Abalone', format='csv')
dataset.load()
# Scale dataset using normalisation
dataset.scale(method='normalisation')
from arm_preprocessing.dataset import Dataset
# Initialise dataset with filename and format
dataset = Dataset('datasets/Abalone', format='csv')
dataset.load()
# Scale dataset using standardisation
dataset.scale(method='standardisation')
Feature selection
The following examples demonstrate how to select features from a dataset.
from arm_preprocessing.dataset import Dataset
# Initialise dataset with filename and format
dataset = Dataset('datasets/sportydatagen', format='csv')
dataset.load()
# Feature selection
dataset.feature_selection(
method='kendall', threshold=0.15, class_column='calories')