{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 03. 条件变量"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"通过上两个教程,您学习到了如何定义多参数的配置空间。但在一些特殊的场景,如AutoML场景下,您需要定义一个有条件依赖的空间。这个教程将分为两个部分讲解,1 为分层参数, 2 为条件与禁止语句。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 层次变量"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"试想,如果我们定义一个AutoML问题的配置空间,我们不仅要从学习器候选列表中对学习器进行选择,也就是**算法选择**(Algorithm Selection,AS),还要在选择学习器后对其超参数进行优化,也就是**超参优化**(HyperParameters Optimization, HPO)。\n",
"\n",
"综上,AutoML问题可以定义为算法选择与超参优化问题(Combinaiton of AS and HPO, CASH) [[1]](#refer-anchor-1)。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![CASH](https://img-blog.csdnimg.cn/20201223130020129.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"举个例子,如上图所示[[2]](#refer-anchor-2),分类器我们可以从SVM、LDA、RandomForest等中选择(**算法选择**),如果我们选择了SVM分类器,我们还要对C、Gamma等参数进行优化(**超参优化**),如果Kernel选择了rbf,coeff$_0$参数会被激活$\\cdots$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"因此,我们需要定义一个分层配置空间。这个分层配置空间可以看成一个有向无环图(DAG),在实际操作中当做树来处理也可以。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"我们先按照上图的描述,用HDL定义一个极简的分层配置空间"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from ultraopt.hdl import hdl2cs, layering_config, plot_layered_dict, plot_hdl"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"HDL = {\n",
" \"Classifier(choice)\":{\n",
" \"SVM\": {\n",
" \"C\": {\"_type\": \"loguniform\", \"_value\": [0.01, 10000], \"_default\": 1.0},\n",
" },\n",
" \"LDA\": {\n",
" \"n_components\": {\"_type\": \"int_uniform\", \"_value\": [2, 9], \"_default\": 2},\n",
" }\n",
" } \n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"从上述的HDL定义可以看到,**算法选择变量**需用使用`(choice)`后缀,其值为一个字典,每个键表示候选算法名。\n",
"\n",
"通过 `ultraopt.hdl.plot_hdl`可以对超参可视化:\n",
"\n",
"- **六边形**结点 : 算法选择变量,即后缀为 `(choice)`的变量\n",
"- **矩形**结点 : 超参空间,不具备取值范围等实际意义,只起到**容纳超参变量**和**指示算法选择结果**的作用\n",
"- **椭圆形**结点 : 超参变量"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"CS = hdl2cs(HDL)\n",
"plot_hdl(HDL)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"在 `SVM` 这个超参空间内部,还可以继续定义算法选择配置空间。我们用HDL定义更复杂的分层配置空间:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"HDL = {\n",
" \"Classifier(choice)\":{\n",
" \"SVM\": {\n",
" \"C\": {\"_type\": \"loguniform\", \"_value\": [0.01, 10000], \"_default\": 1.0},\n",
" \"kernel(choice)\": {\n",
" \"poly\": {\n",
" \"degree\": {\"_type\": \"int_uniform\", \"_value\": [2, 5], \"_default\": 3}},\n",
" \"rbf\": {\n",
" \"coef0\": {\"_type\": \"quniform\", \"_value\": [-1, 1], \"_default\": 0}},\n",
" \"sigmoid\": {\n",
" \"coef0\": {\"_type\": \"quniform\", \"_value\": [-1, 1], \"_default\": 0}},\n",
" }\n",
" },\n",
" \"LDA\": {\n",
" \"n_components\": {\"_type\": \"int_uniform\", \"_value\": [2, 9], \"_default\": 2},\n",
" \"solver(choice)\": {\n",
" \"lsgr\": {\n",
" \"shrinkage\": {\"_type\": \"choice\", \"_value\": [True, False]}},\n",
" \"eigen\": {\n",
" \"shrinkage\": {\"_type\": \"choice\", \"_value\": [True, False]}},\n",
" \"svd\": {},\n",
" }\n",
" }\n",
" } \n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"可视化这个HDL"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"CS = hdl2cs(HDL)\n",
"plot_hdl(HDL)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 配置层次化"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"我们知道,UltraOpt优化的空间是配置空间(ConfigSpace对象),用超参描述语言(HDL, dict对象)描述。从这个空间中采样可以得到配置(config, Configuration或dict对象)\n",
"\n",
"现在我们尝试对上文得到的配置空间进行采样:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'Classifier:__choice__': 'SVM',\n",
" 'Classifier:SVM:C': 1.0,\n",
" 'Classifier:SVM:kernel:__choice__': 'poly',\n",
" 'Classifier:SVM:kernel:poly:degree': 3}"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"config = CS.get_default_configuration().get_dictionary() # 获取配置空间的默认配置\n",
"config"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"可以看到,这个dict对象的配置是单层的,key的层次用`:`分割。\n",
"\n",
"我们可以用`ultraopt.hdl.layering_config`函数对这个dict对象的`config`进行分层,得到`layered_config`"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"layered_dict = layering_config(config)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"我们可以用`ultraopt.hdl.plot_layered_dict`函数对`layered_config`进行可视化"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"plot_layered_dict(layered_dict)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 高级条件语句"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### activate 语句"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"我们希望能更灵活地定义条件超参,比如说我们需要将两个条件超参定义在同一层级中。\n",
"\n",
"以上文的`SVM`为例,我们希望超参`degree`和`coef0`能够与`kernel`同级,但又能保证原有的依赖条件。如图所示:\n",
"\n",
"![sample-layer](https://img-blog.csdnimg.cn/20210103112834860.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"我们来梳理一下`SVM`中`kernel`与其他变量的的依赖条件是怎样的:\n",
"\n",
"| `kernel` 类型 | 需要使用的参数 | `kernel`公式 |\n",
"|--------------|--------------|------------|\n",
"| `rbf`|`gamma`| $K\\left(\\mathbf{x}, \\mathbf{x}^{\\prime}\\right)=\\exp \\left(-\\frac{\\left\\|\\mathbf{x}-\\mathbf{x}^{\\prime}\\right\\|^{2}}{2 \\sigma^{2}}\\right), \\gamma=\\frac{1}{2 \\sigma^{2}}$|\n",
"|`sigmoid`|`gamma`, `coef0`|$K\\left(\\mathbf{x}, \\mathbf{x}^{\\prime}\\right)=sigmoid(\\gamma \\cdot \\mathbf{x}^T \\mathbf{x} + coef_0)$|\n",
"|`poly`|`gamma`, `coef0`, `degree`|$K\\left(\\mathbf{x}, \\mathbf{x}^{\\prime}\\right)=(\\gamma \\cdot \\mathbf{x}^T \\mathbf{x} + coef_0)^{d}$|"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"通过这个表格,我们不妨将依赖关系想象为激活关系:\n",
"\n",
"- `kernel`=`rbf` $\\rightarrow$ 激活 `[\"gamma\"]`\n",
"\n",
"- `kernel`=`sigmoid` $\\rightarrow$ 激活 `[\"gamma\", \"coef0\"]`\n",
"\n",
"- `kernel`=`poly` $\\rightarrow$ 激活 `[\"degree\", \"gamma\", \"coef0\"]`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"于是我们根据这一逻辑,为HDL超参描述语言增加了`activate`激活语句, 编写方法如下:\n",
"\n",
"```python\n",
"\"__activate\": { # 两个下划线开头的都是特殊语句,而不是超参或超参空间的定义\n",
" \"激活变量\": {\n",
" \"激活变量的取值1\": [\n",
" # 激活变量 = 激活变量的取值1 时, 被激活的变量\n",
" \"被激活的变量1\",\n",
" \"被激活的变量2\",\n",
" ...\n",
" ],\n",
" \"激活变量的取值2\": ...,\n",
" ...,\n",
" }\n",
"}\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"为了让您更好地理解,我们用`activate语句`定义SVM变量间的依赖关系:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"HDL_activate = {\n",
" \"kernel\": {\"_type\": \"choice\", \"_value\": [\"rbf\", \"poly\", \"sigmoid\"], \"_default\": \"rbf\"},\n",
" \"degree\": {\"_type\": \"int_uniform\", \"_value\": [2, 5], \"_default\": 3},\n",
" \"gamma\": {\"_type\": \"loguniform\", \"_value\": [1e-05, 8], \"_default\": 0.1},\n",
" \"coef0\": {\"_type\": \"quniform\", \"_value\": [-1, 1], \"_default\": 0},\n",
" \"__activate\": {\n",
" \"kernel\": {\n",
" \"rbf\": [\"gamma\"],\n",
" \"sigmoid\": [\"gamma\", \"coef0\"],\n",
" \"poly\": [\"degree\", \"gamma\", \"coef0\"]\n",
" }\n",
" }\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Configuration space object:\n",
" Hyperparameters:\n",
" coef0, Type: UniformFloat, Range: [-1.0, 1.0], Default: 0.0, Q: 1.0\n",
" degree, Type: UniformInteger, Range: [2, 5], Default: 3\n",
" gamma, Type: UniformFloat, Range: [1e-05, 8.0], Default: 0.1, on log-scale\n",
" kernel, Type: Categorical, Choices: {rbf, poly, sigmoid}, Default: rbf\n",
" Conditions:\n",
" coef0 | kernel in {'poly', 'sigmoid'}\n",
" degree | kernel == 'poly'"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"CS_activate = hdl2cs(HDL_activate)\n",
"CS_activate"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"我们看看采样得到的样本是否满足`activate语句`:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Configuration:\n",
" gamma, Value: 0.1\n",
" kernel, Value: 'rbf'"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"CS_activate.get_default_configuration()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"可以看到`kernel = rbf`,只能激活`gamma`参数。满足`activate语句`。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"同时这个配置空间的所有变量都处在同一层,满足我们的需求:"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"plot_hdl(HDL_activate)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### condition 语句"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"我们继续讨论`SVM`这个例子。上文我们从**激活**这个角度思考变量间的关系,但我们能否从**依赖**这个角度来呢?\n",
"\n",
"我们看到:\n",
"\n",
"- 所有的`kernel`取值都会激活`gamma`,所以`gamma`会一直存在。\n",
"- `coef0`会在`kernel`取值为`'poly', 'sigmoid'`时被激活。\n",
"- `degree`会在`kernel`取值为`poly'`时被激活。\n",
"\n",
"综上,我们为HDL超参描述语言增加了`condition`条件语句, 编写方法如下:\n",
"\n",
"```python\n",
"\"__condition\": [ # 键为\"__condition\", 值为一个列表, \n",
" # 每个列表项是一个字典,\n",
" {\n",
" # 字典有 _child, _parent, _values 3个键\n",
" \"_child\": \"coef0\", # 依赖变量\n",
" \"_parent\": \"kernel\", # 被依赖变量\n",
" \"_values\": [\"poly\", \"sigmoid\"], # 被依赖变量激活依赖变量时的取值\n",
" },\n",
" ...\n",
"]\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"为了让您更好地理解,我们用`conditional语句`定义SVM变量间的依赖关系:"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"HDL_condition = {\n",
" \"kernel\": {\"_type\": \"choice\", \"_value\": [\"rbf\", \"poly\", \"sigmoid\"], \"_default\": \"rbf\"},\n",
" \"degree\": {\"_type\": \"int_uniform\", \"_value\": [2, 5], \"_default\": 3},\n",
" \"gamma\": {\"_type\": \"loguniform\", \"_value\": [1e-05, 8], \"_default\": 0.1},\n",
" \"coef0\": {\"_type\": \"quniform\", \"_value\": [-1, 1], \"_default\": 0},\n",
" \"__condition\": [\n",
" {\n",
" \"_child\": \"coef0\",\n",
" \"_parent\": \"kernel\",\n",
" \"_values\": [\"poly\", \"sigmoid\"],\n",
" },\n",
" {\n",
" \"_child\": \"degree\",\n",
" \"_parent\": \"kernel\",\n",
" \"_values\": \"poly\",\n",
" },\n",
" ]\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Configuration space object:\n",
" Hyperparameters:\n",
" coef0, Type: UniformFloat, Range: [-1.0, 1.0], Default: 0.0, Q: 1.0\n",
" degree, Type: UniformInteger, Range: [2, 5], Default: 3\n",
" gamma, Type: UniformFloat, Range: [1e-05, 8.0], Default: 0.1, on log-scale\n",
" kernel, Type: Categorical, Choices: {rbf, poly, sigmoid}, Default: rbf\n",
" Conditions:\n",
" coef0 | kernel in {'poly', 'sigmoid'}\n",
" degree | kernel == 'poly'"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"CS_conditon = hdl2cs(HDL_condition)\n",
"CS_conditon"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"两种方法得到的配置空间以一样的:"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"CS_conditon == CS_activate"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### forbidden 语句"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"如果变量间的取值存在冲突,如在`LinearSVC`中,`penalty == \"l1\"`且`loss == \"hinge\"`时是非法的,这时我们可以用 `forbidden语句` 定义禁止关系"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"我们为HDL超参描述语言增加了`forbidden`条件语句, 编写方法如下:\n",
"\n",
"```python\n",
"\"__forbidden\": [ # 键为\"__forbidden\", 值为一个列表, \n",
" # 每个列表项是一个字典,\n",
" {\n",
" # 禁止一组变量的共现取值:\n",
" \"penalty\": \"l1\",\n",
" \"loss\": \"hinge\",\n",
" ...\n",
" },\n",
" ...\n",
"]\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [],
"source": [
"HDL_forbidden = {\n",
" \"max_iter\": {\"_type\": \"int_quniform\", \"_value\": [300, 3000, 100], \"_default\": 600},\n",
" \"penalty\": {\"_type\": \"choice\", \"_value\": [\"l1\", \"l2\"], \"_default\": \"l2\"},\n",
" \"dual\": {\"_type\": \"choice\", \"_value\": [True, False], \"_default\": False},\n",
" \"loss\": {\"_type\": \"choice\", \"_value\": [\"hinge\", \"squared_hinge\"], \"_default\": \"squared_hinge\"},\n",
" \"C\": {\"_type\": \"loguniform\", \"_value\": [0.01, 10000], \"_default\": 1.0},\n",
" \"__forbidden\": [\n",
" {\"penalty\": \"l1\", \"loss\": \"hinge\"},\n",
" {\"penalty\": \"l2\", \"dual\": False, \"loss\": \"hinge\"},\n",
" {\"penalty\": \"l1\", \"dual\": False},\n",
" {\"penalty\": \"l1\", \"dual\": True, \"loss\": \"squared_hinge\"},\n",
" ]\n",
"} "
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Configuration space object:\n",
" Hyperparameters:\n",
" C, Type: UniformFloat, Range: [0.01, 10000.0], Default: 1.0, on log-scale\n",
" dual, Type: Categorical, Choices: {True:bool, False:bool}, Default: True:bool\n",
" loss, Type: Categorical, Choices: {hinge, squared_hinge}, Default: squared_hinge\n",
" max_iter, Type: UniformInteger, Range: [300, 3000], Default: 600, Q: 100\n",
" penalty, Type: Categorical, Choices: {l1, l2}, Default: l2\n",
" Forbidden Clauses:\n",
" (Forbidden: penalty == 'l1' && Forbidden: loss == 'hinge')\n",
" (Forbidden: penalty == 'l2' && Forbidden: dual == 'False:bool' && Forbidden: loss == 'hinge')\n",
" (Forbidden: penalty == 'l1' && Forbidden: dual == 'False:bool')\n",
" (Forbidden: penalty == 'l1' && Forbidden: dual == 'True:bool' && Forbidden: loss == 'squared_hinge')"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"CS_forbidden = hdl2cs(HDL_forbidden)\n",
"CS_forbidden"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Configuration:\n",
" C, Value: 3.915967916341741\n",
" dual, Value: 'True:bool'\n",
" loss, Value: 'hinge'\n",
" max_iter, Value: 1900\n",
" penalty, Value: 'l2'"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"CS_forbidden.sample_configuration()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**参考文献**\n",
"\n",
"---\n",
"\n",
"\n",
"\n",
"- [1] [Thornton, Chris et al. “Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms.” Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (2013): n. pag.](https://arxiv.org/abs/1208.3719)\n",
"\n",
"\n",
"\n",
"- [2] [Zoller, Marc-Andre and Marco F. Huber. “Benchmark and Survey of Automated Machine Learning Frameworks.” arXiv: Learning (2019): n. pag.](https://arxiv.org/abs/1904.12054)\n",
"\n",
"\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "auto-sklearn",
"language": "python",
"name": "auto-sklearn"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.10"
}
},
"nbformat": 4,
"nbformat_minor": 4
}