autoflow.evaluation package

Submodules

autoflow.evaluation.base module

autoflow.evaluation.ensemble_evaluator module

autoflow.evaluation.train_evaluator module

class autoflow.evaluation.train_evaluator.TrainEvaluator(run_id, data_manager: autoflow.data_manager.DataManager, resource_manager: autoflow.resource_manager.base.ResourceManager, random_state: int, metric: autoflow.metrics.Scorer, groups: List[int], should_calc_all_metric: bool, splitter, should_store_intermediate_result: bool, should_stack_X: bool, should_finally_fit: bool, model_registry: dict, budget2kfold: Optional[Dict[float, int]] = None, algo2budget_mode: Optional[Dict[str, str]] = None, algo2iter: Optional[Dict[str, int]] = None, nameserver=None, nameserver_port=None, host=None, worker_id=None, timeout=None)[source]

Bases: autoflow.hpbandster.core.worker.Worker, autoflow.utils.klass.StrSignatureMixin

Parameters
  • run_id (anything with a __str__ method) – unique id to identify individual HpBandSter run

  • nameserver (str) – hostname or IP of the nameserver

  • nameserver_port (int) – port of the nameserver

  • logger (logging.logger instance) – logger used for debugging output

  • host (str) – hostname for this worker process

  • worker_id (anything with a __str__method) – if multiple workers are started in the same process, you MUST provide a unique id for each one of them using the id argument.

  • timeout (int or float or None) – specifies the timeout a worker will wait for a new after finishing a computation before shutting down. Towards the end of a long run with multiple workers, this helps to shutdown idling workers. We recommend a timeout that is roughly half the time it would take for the second largest budget to finish. The default (None) means that the worker will wait indefinitely and never shutdown on its own.

compute(config: dict, config_info: dict, budget: float, **kwargs)[source]

The function you have to overload implementing your computation.

Parameters
  • config_id (tuple) – a triplet of ints that uniquely identifies a configuration. the convention is id = (iteration, budget index, running index) with the following meaning: - iteration: the iteration of the optimization algorithms. E.g, for Hyperband that is one round of Successive Halving - budget index: the budget (of the current iteration) for which this configuration was sampled by the optimizer. This is only nonzero if the majority of the runs fail and Hyperband resamples to fill empty slots, or you use a more ‘advanced’ optimizer. - running index: this is simply an int >= 0 that sort the configs into the order they where sampled, i.e. (x,x,0) was sampled before (x,x,1).

  • config (dict) – the actual configuration to be evaluated.

  • budget (float) – the budget for the evaluation

  • working_directory (str) – a name of a directory that is unique to this configuration. Use this to store intermediate results on lower budgets that can be reused later for a larger budget (for iterative algorithms, for example).

Returns

needs to return a dictionary with two mandatory entries:
  • ’loss’: a numerical value that is MINIMIZED

  • ’info’: This can be pretty much any build in python type, e.g. a dict with lists as value. Due to Pyro4 handling the remote function calls, 3rd party types like numpy arrays are not supported!

Return type

dict

create_component(sub_dhp: Dict, phase: str, step_name, in_feature_groups='all', out_feature_groups='all', outsideEdge_info=None)[source]
create_estimator(dhp: Dict) → autoflow.workflow.ml_workflow.ML_Workflow[source]
create_preprocessor(dhp: Dict) → Optional[autoflow.workflow.ml_workflow.ML_Workflow][source]
evaluate(config_id, model: autoflow.workflow.ml_workflow.ML_Workflow, X, y, X_test, y_test, budget)[source]
get_Xy()[source]
get_cache_key(config_id, X_train: autoflow.data_container.dataframe.DataFrameContainer, y_train: autoflow.data_container.ndarray.NdArrayContainer)[source]
loss(y_true, y_hat)[source]
parse_key(key: str)[source]
shp2model(shp)[source]

Module contents