Curso de Fundamentos de IA: investigando algoritmos e abordagens de machine learning

Por Ricardo Costa Val do Rosario

Segui as instruções e encontrei o resultado abaixo. Após solicitei o CoPilot 365 para analisar o resultado encontrado Pergunto se está correto ?

Aula 02 Faça como eu fiz: ajustando modelos de machine learning

[1]

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split

california = fetch_california_housing()
X = california.data
y = california.target


# Dividindo os dados entre treino e teste

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

[2]

# Passo 1: Importar as bibliotecas necessárias

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score


# Passo 2: Carregar o dataset Iris

iris = load_iris()
X = iris.data  # Características (comprimento e largura das pétalas e sépalas)
y = iris.target  # Rótulos (espécies das flores)


# Passo 3: Dividir os dados em treino e teste
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Passo 4: Treinar o modelo

model = DecisionTreeClassifier()
model.fit(X_train, y_train)


# Passo 5: Fazer previsões e avaliar o modelo

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Acurácia do modelo: {accuracy * 100:.2f}%")
Acurácia do modelo: 100.00%

[3]

import pandas as pd
from sklearn.datasets import load_diabetes


# Carregar os dados

diabetes = load_diabetes()
X = diabetes.data
y = diabetes.target


# Dividir os dados entre treino e teste

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

[4]

from sklearn.ensemble import RandomForestRegressor
import matplotlib.pyplot as plt


# Treinar o modelo

model = RandomForestRegressor()
model.fit(X_train, y_train)


# Obter a importância das features

importances = model.feature_importances_
feature_names = diabetes.feature_names


# Criar um gráfico de barras

plt.figure(figsize=(10, 6))
plt.barh(feature_names, importances)
plt.title("Importância das Features")
plt.xlabel("Importância")
plt.ylabel("Feature")
plt.show()

[5]

# Passo 1: Importar as bibliotecas necessárias

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score


# Passo 2: Carregar o dataset Iris

iris = load_iris()
X = iris.data  # Características (comprimento e largura das pétalas e sépalas)
y = iris.target  # Rótulos (espécies das flores)

# Passo 3: Dividir os dados em treino e teste
X
_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Passo 4: Treinar o modelo
model = DecisionTreeClassifier()
model.fit(X_train, y_train)

# Passo 5: Fazer previsões e avaliar o modelo
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Acurácia do modelo: {accuracy * 100:.2f}%")
Acurácia do modelo: 100.00%
[ ]
Type code here to execute.

Análise da IA CoPilot 365

The selected code demonstrates a basic machine learning workflow using the Iris dataset and a Decision Tree Classifier.

Here's a breakdown of what each section does:

• Passo 1:

Importar as bibliotecas necessárias: 
This section imports the required libraries from the sklearn package, includingfunctions to load the Iris dataset, 
split data, use a Decision Tree classifier, and calculate accuracy.

• Passo 2:

Carregar o dataset Iris: 
This loads the Iris dataset, separating the features (measurements of the flowers) into X and the target labels (the species 
of the flowers) into y.

• Passo 3:

Dividir os dados em treino e teste: 
The data is split into training sets (X_train, y_train) and testing sets (X_test, y_test). 
The training data is used to train the model, and the testing data is used to evaluate its performance. test_size=0.3 
means 30% of the data is used for testing, and random_state=42 ensures the split is the same each time the code is run.

• Passo 4:

Treinar o modelo: 
A Decision Tree Classifier model is created and then trained using the training data.

• Passo 5:

Fazer previsões e avaliar o modelo: 
The trained model makes predictions on the test data (y_pred). 
The accuracy_score function then compares these predictions to the actual labels (y_test) to
calculate the accuracy of the model. 
Finally, the accuracy is printed to the console.

Ei, Ricardo! Tudo bem?

Você está no caminho certo, parabéns! Trouxe uma alternativa de como concluiu as atividades. Eu só destacaria a utilização de ajustes hiperparâmetros como a atividade nos informa no enunciado para treinar.

Por exemplo:

Para o modelo de árvore de decisão com o dataset Iris, você pode usar o GridSearchCV da biblioteca sklearn para encontrar os melhores hiperparâmetros. Isso envolve testar diferentes combinações de parâmetros, como a profundidade máxima da árvore ou o critério de divisão, para ver quais produzem o melhor desempenho.

Como um exemplo usado pelo Instrutor:

from sklearn.model_selection import GridSearchCV

# Definição dos hiperparâmetros a serem testados
param_grid = {
 'max_depth': [3, 5, 7, 10],
 'min_samples_split': [2, 5, 10],
 'min_samples_leaf': [1, 2, 4]
}

# Criando o modelo
model = DecisionTreeRegressor()

# Aplicando Grid Search
grid_search = GridSearchCV(model, param_grid, cv=5, scoring='neg_mean_squared_error')
grid_search.fit(X_train, y_train)

# Exibindo os melhores parâmetros encontrados
print(f"Melhores parâmetros: {grid_search.best_params_}")

Conteúdos relacionados

Continue se dedicando aos estudos e qualquer dúvida, compartilhe no fórum.

Conte com o apoio da comunidade Alura na sua jornada. Abraços e bons estudos!