Configuration Parameters
There's a lot of configuration parameters that can be used to customize the behavior of the application. The following tables lists all the available parameters and their default values for each respective provider.
PathProvider
Logger.logpath
Recommended Value
DEFAULT_LOG_PATH
Get the default log path by importing it from the Logger module
Description
The path to the log folder. The log folder will contain all the log files generated by the pipeline user file.
InitialGathererPathProvider
DataGatherer.filepath
Recommended Value
Description
The path function to the file containing the data to be used for the initial gathering of data. The file should be a csv file.
DataGatherer.subsectionpath
Recommended Value
Description
The path function to the file containing the subsection of the data to be used for the initial gathering of data. The file should be a csv file. This file is used to get a subsection of the massive data file to be used for the initial gathering of data. This parameter is used in both the regular (non-large) pipeline and the large pipeline.
DataGatherer.splitfilespath
Recommended Value
Description
The path function to the folder containing the split files of the data. The split files will be individually cleaned and then merged later. This folder is only used in the large pipeline.
GeneratorPathProvider
ConnectedDrivingLargeDataCleaner.cleanedfilespath
Recommended Value
Description
The path function to the folder containing the cleaned split files of the data. The split files will be individually cleaned and then stored in this folder. This folder is only used in the large pipeline.
ConnectedDrivingLargeDataCleaner.combinedcleandatapath
Recommended Value
Description
The path function to the file containing the combined cleaned data. This file is only used in the large pipeline.
MLPathProvider
MConnectedDrivingDataCleaner.cleandatapath
Recommended Value
Description
The path function to the file containing the attacked and cleaned data. This file is used in every pipeline to get the data to be used for training and testing.
MDataClassifier.plot_confusion_matrix_path
Recommended Value
Description
The path function to the folder containing the confusion matrix plots. This folder is used to store the confusion matrix plots generated by each classifier.
GeneratorContextProvider
DataGatherer.numrows
Recommended Value
Description
The number of rows to gather and store as a subsection from the original dataset. This parameter is used in both the regular (non-large) pipeline and the large pipeline.
DataGatherer.lines_per_file
Recommended Value
Description
The number of rows to store in each split file. This parameter is only used in the large pipeline.
ConnectedDrivingCleaner.x_pos
Recommended Value
Description
The x coordinate of the center of the circle to be used for filtering the data. This parameter is only used in the large pipeline when filtering the data to be within a certain range.
ConnectedDrivingCleaner.y_pos
Recommended Value
Description
The y coordinate of the center of the circle to be used for filtering the data. This parameter is only used in the large pipeline when filtering the data to be within a certain range.
ConnectedDrivingCleaner.columns
Recommended Value
["metadata_generatedAt", "metadata_recordType", "metadata_serialId_streamId",
"metadata_serialId_bundleSize", "metadata_serialId_bundleId", "metadata_serialId_recordId",
"metadata_serialId_serialNumber", "metadata_receivedAt",
# "metadata_rmd_elevation", "metadata_rmd_heading","metadata_rmd_latitude", "metadata_rmd_longitude", "metadata_rmd_speed",
# "metadata_rmd_rxSource","metadata_bsmSource",
"coreData_id", "coreData_secMark", "coreData_position_lat", "coreData_position_long",
"coreData_accuracy_semiMajor", "coreData_accuracy_semiMinor",
"coreData_elevation", "coreData_accelset_accelYaw","coreData_speed", "coreData_heading", "coreData_position"]
Description
The columns to be used for filtering the initial data. Most of the columns are useless so these columns are the ones we are choosing to use in the pipeline.
ConnectedDrivingLargeDataCleaner.max_dist
Recommended Value
Description
The maximum distance from the center of the circle to be used for filtering the data. This parameter is only used in the large pipeline when filtering the data to be within a certain range.
ConnectedDrivingCleaner.shouldGatherAutomatically
Recommended Value
Description
Whether or not to automatically gather the data if the cleaner isn't given any data. The reason we set it off by default is to prevent data from being collected without the user's knowledge.
ConnectedDrivingLargeDataCleaner.cleanerClass
Recommended Value
Description
The class to be used for cleaning the data. This parameter should match the class that the cleanFunc comes from.
ConnectedDrivingLargeDataCleaner.cleanFunc
Recommended Value
Description
The function to be used for cleaning the data. This parameter should match the class that the cleanerClass comes from.
ConnectedDrivingLargeDataCleaner.cleanerWithFilterClass
Recommended Value
Description
The class to be used for filtering the data. This parameter should match the class that the filterFunc comes from.
ConnectedDrivingLargeDataCleaner.filterFunc
Recommended Value
Description
The function to be used for filtering the data. This parameter should match the class that the cleanerWithFilterClass comes from.
ConnectedDrivingAttacker.SEED
Recommended Value
Description
The seed to be used for the random number generator. This parameter is only used in the large pipeline when generating the attack data.
ConnectedDrivingCleaner.isXYCoords
Recommended Value
Description
Whether or not the data is in (x,y) coordinates (as distance from the point on the longitude, latitude axis respectively in meters).
ConnectedDrivingAttacker.attack_ratio
Recommended Value
Description
The ratio of the data to be attacked. For example, 0.3 = 30% of the data will be attacked. The attack ratio is used differently depending on the attack distribution method. For example, add_attackers
will specify a ratio of the cars to be attackers and 100% of their BSMs will be attacked. On the other hand, add_rand_attackers
will specify a ratio of the BSMs to be attacked randomly.
ConnectedDrivingCleaner.cleanParams
Recommended Value
f"clean_data_with_timestamps-within_rangeXY-WithXYCoords-1000mdist-x{x_pos_str}y{y_pos_str}dd02mm04yyyy2021"
Description
The name of the parameters to be used for cleaning the data. This parameter is used in the caching and should match cleanFunc + any other parameters used such as the filter, etc.
MLContextProvider
MConnectedDrivingDataCleaner.columns
Recommended Value
[
# "metadata_generatedAt", "metadata_recordType", "metadata_serialId_streamId",
# "metadata_serialId_bundleSize", "metadata_serialId_bundleId", "metadata_serialId_recordId",
# "metadata_serialId_serialNumber", "metadata_receivedAt",
# "metadata_rmd_elevation", "metadata_rmd_heading","metadata_rmd_latitude", "metadata_rmd_longitude", "metadata_rmd_speed",
# "metadata_rmd_rxSource","metadata_bsmSource",
"coreData_id", # "coreData_position_lat", "coreData_position_long",
"coreData_secMark", "coreData_accuracy_semiMajor", "coreData_accuracy_semiMinor",
"month", "day", "year", "hour", "minute", "second", "pm",
"coreData_elevation", "coreData_accelset_accelYaw", "coreData_speed", "coreData_heading", "x_pos", "y_pos", "isAttacker"],
Description
The columns to be used for training the model (and also the final columns after the attacker finishes).
MClassifierPipeline.classifier_instances
Recommended Value (AUTO_FILLED)
Description
The classifiers to be used for training the model. These are autofilled but we can change them if we want to use different classifiers. At the top of the example class on the development page, we specify the CLASSIFIER_INSTANCES variable to be used for the pipeline but we didn't include it in the config because it was autofilled. However, it would be easy to modify the array and pass it in. Make sure to include the modified parameters in your LOG_NAME and file name to avoid caching errors though!
CleanerWithFilterWithinRangeXYAndDay.day
Recommended Value
Description
The day to be used for filtering the data. This parameter is only used in the large pipeline when filtering the data to be within a certain range.
CleanerWithFilterWithinRangeXYAndDay.month
Recommended Value
Description
The month to be used for filtering the data. This parameter is only used in the large pipeline when filtering the data to be within a certain range.
CleanerWithFilterWithinRangeXYAndDay.year
Recommended Value
Description
The year to be used for filtering the data. This parameter is only used in the large pipeline when filtering the data to be within a certain range.
MDataClassifier.plot_distribution_path
Recommended Value
Description
The path to be used for plotting the distribution of the data. This parameter is only used in the large pipeline when plotting the distribution of the data during feature analysis.