借助LLM成功模型选用和实验智能化

  • 电脑网络维修
  • 2024-11-14

大言语模型(LLM)已成为一种工具,从回答疑问到生成义务列表,它们在许多方面简化了咱们的上班。如今团体和企业曾经经常使用LLM来协助成功上班。

代码生成和评价最近曾经成为许多商业产品提供的关键性能,以协助开发人员处置代码。LLM还可以进一步用于处置数据迷信上班,尤其是模型选用和实验。

本文将讨论如何将智能化用于模型选用和实验。

借助LLM成功模型选用和实验智能化

咱们将设置用于模型训练的数据集和用于智能化的代码。在这个例子中,咱们将经常使用来自Kaggle的。以下是我为预处置环节所做的预备。

import pandas as pddf = pd.read_csv('fraud_data.csv')df = df.drop(['trans_date_trans_time', 'merchant', 'dob', 'trans_num', 'merch_lat', 'merch_long'], axis =1)df = df.dropna().reset_index(drop = True)df.to_csv('fraud_data.csv', index = False)

咱们将只经常使用一些数据集,摈弃一切缺失的数据。这不是最优的环节,但咱们关注的是模型选用和实验。

接上去,咱们将为咱们的名目预备一个文件夹,将所无关系文件放在那里。首先,咱们将为环境创立requirements.txt文件。你可以用上方的软件包来填充它们。

openaipandasscikit-learnpyyaml

接上去,咱们将为所无关系的元数据经常使用YAML文件。这将包括OpenAI API密钥、要测试的模型、评价度量目的和数据集的位置。

llm_api_key: "YOUR-OPENAI-API-KEY"default_models:- LogisticRegression- DecisionTreeClassifier- RandomForestClassifiermetrics: ["accuracy", "precision", "recall", "f1_score"]dataset_path: "fraud_data.csv"

而后,咱们导入这个环节中经常使用的软件包。咱们将依托Scikit-Learn用于建模环节,并经常使用OpenAI的GPT-4作为LLM。

import pandas as pdimport yamlimport astimport reimport sklearnfrom openai import OpenAIfrom sklearn.linear_model import LogisticRegressionfrom sklearn.tree import DecisionTreeClassifierfrom sklearn.ensemble import RandomForestClassifierfrom sklearn.model_selection import train_test_splitfrom sklearn.preprocessing import LabelEncoderfrom sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

此外,咱们将设置辅佐(helper)函数和消息来协助该环节。从数据集加载到数据预处置,性能加载器在如下的函数中。

model_mapping = {"LogisticRegression": LogisticRegression,"DecisionTreeClassifier": DecisionTreeClassifier,"RandomForestClassifier": RandomForestClassifier}def load_config(config_path='config.yaml'):with open(config_path, 'r') as file:config = yaml.safe_load(file)return configdef load_data(dataset_path):return pd.read_csv(dataset_path)def preprocess_data(df):label_encoders = {}for column in df.select_dtypes(include=['object']).columns:le = LabelEncoder()df[column] = le.fit_transform(df[column])label_encoders[column] = lereturn df, label_encoders

在同一个文件中,咱们将LLM设置为表演机器学习角色的专家。咱们将经常使用上方的代码来启动它。

def call_llm(prompt, api_key):client = OpenAI(api_key=api_key)response = client.chat.completions.create(model="gpt-4",messages=[{"role": "system", "content": "You are an expert in machine learning and able to evaluate the model well."},{"role": "user", "content": prompt}])return response.choices[0].message.content.strip()

你可以将LLM模型更改为所需的模型,比如来自HuggingFace的开源模型,但咱们倡导暂且保持经常使用OpenAI。

我将在上方的代码中预备一个函数来清算LLM结果。这确保了输入可以用于模型选用和实验步骤的后续环节。

def clean_hyperparameter_suggestion(suggestion):pattern = r'\{.*?\}'match = re.search(pattern, suggestion, re.DOTALL)if match:cleaned_suggestion = match.group(0)return cleaned_suggestionelse:print("Could not find a dictionary in the hyperparameter suggestion.")return Nonedef extract_model_name(llm_response, available_models):for model in available_models:pattern = r'\b' + re.escape(model) + r'\b'if re.search(pattern, llm_response, re.IGNORECASE):return modelreturn Nonedef validate_hyperparameters(model_class, hyperparameters):valid_params = model_class().get_params()invalid_params = []for param, value in hyperparameters.items():if param not in valid_params:invalid_params.append(param)else:if param == 'max_features' and value == 'auto':print(f"Invalid value for parameter '{param}': '{value}'")invalid_params.append(param)if invalid_params:print(f"Invalid hyperparameters for {model_class.__name__}: {invalid_params}")return Falsereturn Truedef correct_hyperparameters(hyperparameters, model_name):corrected = Falseif model_name == "RandomForestClassifier":if 'max_features' in hyperparameters and hyperparameters['max_features'] == 'auto':print("Correcting 'max_features' from 'auto' to 'sqrt' for RandomForestClassifier.")hyperparameters['max_features'] = 'sqrt'corrected = Truereturn hyperparameters, corrected

而后,咱们将须要该函数来启动模型和评价训练环节。上方的代码将用于经过接受宰割器数据集、咱们要映射的模型称号以及超参数来训练模型。结果将是度量目的和模型对象。

def train_and_evaluate(X_train, X_test, y_train, y_test, model_name, hyperparameters=None):if model_name not in model_mapping:print(f"Valid model names are: {list(model_mapping.keys())}")return None, Nonemodel_class = model_mapping.get(model_name)try:if hyperparameters:hyperparameters, corrected = correct_hyperparameters(hyperparameters, model_name)if not validate_hyperparameters(model_class, hyperparameters):return None, Nonemodel = model_class(**hyperparameters)else:model = model_class()except Exception as e:print(f"Error instantiating model with hyperparameters: {e}")return None, Nonetry:model.fit(X_train, y_train)except Exception as e:print(f"Error during model fitting: {e}")return None, Noney_pred = model.predict(X_test)metrics = {"accuracy": accuracy_score(y_test, y_pred),"precision": precision_score(y_test, y_pred, average='weighted', zero_division=0),"recall": recall_score(y_test, y_pred, average='weighted', zero_division=0),"f1_score": f1_score(y_test, y_pred, average='weighted', zero_division=0)}return metrics, model

预备就绪后,咱们就可以设置智能化环节了。有几个步骤咱们可以成功智能化,其中包括:

1.训练和评价一切模型

2. LLM选用最佳模型

3. 审核最佳模型的超参数调优

4. 假设LLM倡导,智能运转超参数调优

def run_llm_based_model_selection_experiment(df, config):#Model TrainingX = df.drop("is_fraud", axis=1)y = df["is_fraud"]X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)available_models = config['default_models']model_performance = {}for model_name in available_models:print(f"Training model: {model_name}")metrics, _ = train_and_evaluate(X_train, X_test, y_train, y_test, model_name)model_performance[model_name] = metricsprint(f"Model: {model_name} | Metrics: {metrics}")#LLM selecting the best modelsklearn_version = sklearn.__version__prompt = (f"I have trained the following models with these metrics: {model_performance}. ""Which model should I select based on the best performance?")best_model_response = call_llm(prompt, config['llm_api_key'])print(f"LLM response for best model selection:\n{best_model_response}")best_model = extract_model_name(best_model_response, available_models)if not best_model:print("Error: Could not extract a valid model name from LLM response.")returnprint(f"LLM selected the best model: {best_model}")#Check for hyperparameter tuningprompt_tuning = (f"The selected model is {best_model}. Can you suggest hyperparameters for better performance? ""Please provide them in Python dictionary format, like {'max_depth': 5, 'min_samples_split': 4}. "f"Ensure that all suggested hyperparameters are valid for scikit-learn version {sklearn_version}, ""and avoid using deprecated or invalid values such as 'max_features': 'auto'. ""Don't provide any explanation or return in any other format.")tuning_suggestion = call_llm(prompt_tuning, config['llm_api_key'])print(f"Hyperparameter tuning suggestion received:\n{tuning_suggestion}")cleaned_suggestion = clean_hyperparameter_suggestion(tuning_suggestion)if cleaned_suggestion is None:suggested_params = Noneelse:try:suggested_params = ast.literal_eval(cleaned_suggestion)if not isinstance(suggested_params, dict):print("Hyperparameter suggestion is not a valid dictionary.")suggested_params = Noneexcept (ValueError, SyntaxError) as e:print(f"Error parsing hyperparameter suggestion: {e}")suggested_params = None#Automatically run hyperparameter tuning if suggestedif suggested_params:print(f"Running {best_model} with suggested hyperparameters: {suggested_params}")tuned_metrics, _ = train_and_evaluate(X_train, X_test, y_train, y_test, best_model, hyperparameters=suggested_params)print(f"Metrics after tuning: {tuned_metrics}")else:print("No valid hyperparameters were provided for tuning.")

在上方的代码中,我指定了LLM如何依据实验评价咱们的每个模型。咱们经常使用以下揭示依据模型的性能来选用要经常使用的模型。

prompt = (f"I have trained the following models with these metrics: {model_performance}. ""Which model should I select based on the best performance?")

你一直可以更改揭示,以成功模型选用的不同规定。

一旦选用了最佳模型,我将经常使用以下揭示来倡导应该经常使用哪些超参数用于后续环节。我还指定了Scikit-Learn版本,由于超参数因版本的不同而有变动。

prompt_tuning = (f"The selected model is {best_model}. Can you suggest hyperparameters for better performance? ""Please provide them in Python dictionary format, like {'max_depth': 5, 'min_samples_split': 4}. "f"Ensure that all suggested hyperparameters are valid for scikit-learn version {sklearn_version}, ""and avoid using deprecated or invalid values such as 'max_features': 'auto'. ""Don't provide any explanation or return in any other format.")

你可以以任何想要的方式更改揭示,比如经过更大胆地尝试调优超参数,或减少另一种技术。

我把上方的一切代码放在一个名为automated_model_llm.py的文件中。最后,减少以下代码以运转整个环节。

def main():config = load_config()df = load_data(config['dataset_path'])df, _ = preprocess_data(df)run_llm_based_model_selection_experiment(df, config)if __name__ == "__main__":main()

一旦一切预备就绪,你就可以运转以下代码来口头代码。

python automated_model_llm.py

输入:

LLM selected the best model: RandomForestClassifierHyperparameter tuning suggestion received:{'n_estimators': 100,'max_depth': None,'min_samples_split': 2,'min_samples_leaf': 1,'max_features': 'sqrt','bootstrap': True}Running RandomForestClassifier with suggested hyperparameters: {'n_estimators': 100, 'max_depth': None, 'min_samples_split': 2, 'min_samples_leaf': 1, 'max_features': 'sqrt', 'bootstrap': True}Metrics after tuning: {'accuracy': 0.9730041532071989, 'precision': 0.9722907483489197, 'recall': 0.9730041532071989, 'f1_score': 0.9724045530119824}

这是我实验失掉的示例输入。它或者和你的不一样。你可以设置揭示和生成参数,以取得愈加多变或严厉的LLM输入。但是,假设你正确构建了代码的结构,可以将LLM运用于模型选用和实验智能化。

论断

LLM曾经运行于许多经常使用场景,包括代码生成。经过运用LLM(比如OpenAI GPT模型),咱们就很容易委派LLM处置模型选用和实验这项义务,只需咱们正确地构建输入的结构。在本例中,咱们经常使用样本数据集对模型启动实验,让LLM选用和实验以改良模型。

原文题目: Model Selection and Experimentation Automation with LLMs 作者:Cornellius Yudha Wijaya

  • 关注微信

本网站的文章部分内容可能来源于网络和网友发布,仅供大家学习与参考,如有侵权,请联系站长进行删除处理,不代表本网站立场,转载联系作者并注明出处:https://duobeib.com/diannaowangluoweixiu/4076.html

猜你喜欢

热门标签

洗手盆如何疏浚梗塞 洗手盆为何梗塞 iPhone提价霸占4G市场等于原价8折 明码箱怎样设置明码锁 苏泊尔电饭锅保修多久 长城画龙G8253YN彩电输入指令画面变暗疑问检修 彩星彩电解除童锁方法大全 三星笔记本培修点上海 液晶显示器花屏培修视频 燃气热水器不热水要素 热水器不上班经常出现3种处置方法 无氟空调跟有氟空调有什么区别 norltz燃气热水器售后电话 大连站和大连北站哪个离周水子机场近 热水器显示屏亮显示温度不加热 铁猫牌保险箱高效开锁技巧 科技助力安保无忧 创维8R80 汽修 a1265和c3182是什么管 为什么电热水器不能即热 标致空调为什么不冷 神舟培修笔记本培修 dell1420内存更新 青岛自来水公司培修热线电话 包头美的洗衣机全国各市售后服务预定热线号码2024年修缮点降级 创维42k08rd更新 空调为什么运转异响 热水器为何会漏水 该如何处置 什么是可以自己处置的 重庆华帝售后电话 波轮洗衣机荡涤价格 鼎新热水器 留意了!不是水平疑问! 马桶产生了这5个现象 方便 极速 邢台空调移机电话上门服务 扬子空调缺点代码e4是什么疑问 宏基4736zG可以装置W11吗 奥克斯空调培修官方 为什么突然空调滴水很多 乐视s40air刷机包 未联络视的提高方向 官网培修 格力空调售后电话 皇明太阳能电话 看尚X55液晶电视进入工厂形式和软件更新方法 燃气热水器缺点代码

热门资讯

关注我们

微信公众号