Class Assignment Output for Each ID after LatentClassConditionalLogit #193

asghar13 · 2024-12-01T16:46:13Z

asghar13
Dec 1, 2024

Hello everyone,

I have a few questions regarding obtaining the output after running the LatentClassConditionalLogit model. You can see my code at the end of this message. Specifically, I need the following output:

The assignment of each ID to a class. I need to determine which class each individual belongs to at the end. I have tried using the code for this purpose, but the issue is that, in the Excel file, we only see a single group (I have attached the file named individual_level_class_assignments.csv).

I would be very grateful for any assistance in solving this issue or for any documentation that could help me better understand how to use the code for this purpose.

Thank you very much for your support in advance.

''

code for redeading the data ,

''

Define the columns to retain for the analysis

kept_columns = [
"ID", "Gender", "Age", "Education", "Occup", # Demographic and individual information
"Gas_Capital", "Gas_Annual", "Gas_Emission", "Gas_Work", # Attributes of "Gas"
"Electric_Capital", "Electric_Annual", "Electric_Emission", "Electric_Work", # Attributes of "Electric"
"Heatpump_Capital", "Heatpump_Annual", "Heatpump_Emission", "Heatpump_Work", # Attributes of "Heatpump"
"Solid_Capital", "Solid_Annual", "Solid_Emission", "Solid_Work", # Attributes of "Solid fuel"
"Gas_Av", "Electric_Av", "Heatpump_Av", "Solid_Av", # Availability indicators
"Choice", "Card" # User choices and choice cards
]

Filter the dataset to include only the defined columns

crheating_df = data[kept_columns]

Display a preview of the filtered dataset

print("Filtered dataset:")
print(crheating_df.head())

Display the unique values in the 'Choice' column (to verify mapping)

print("Unique values in 'Choice' column:")
print(crheating_df["Choice"].unique())

Map categorical choices to integers for modeling

choice_mapping = {
"Gas": 0,
"Electric": 1,
"Heatpump": 2,
"Solid": 3
}
crheating_df["Choice"] = crheating_df["Choice"].map(choice_mapping)

Ensure the 'Choice' column is of integer type

crheating_df["Choice"] = crheating_df["Choice"].astype(int)

Convert the dataset into the required format for choice modeling

from choice_learn.data import ChoiceDataset

dataset = ChoiceDataset.from_single_wide_df(
df=crheating_df, # Filtered dataset
items_id=["Gas", "Electric", "Heatpump", "Solid"], # Names of the alternatives
choices_column="Choice", # Column representing user choices
choice_format="items_index", # Encoding format for choices
shared_features_columns=["Gender", "Age", "Education", "Occup"], # Shared features across choices
items_features_suffixes=["Capital", "Annual", "Emission", "Work"], # Attributes for each item
available_items_suffix="Av", # Suffix for availability indicators
delimiter="_", # Delimiter used in column names
)

''

code for LatentClassConditionalLogit:
''

Ensure all numerical columns are explicitly converted to float32

numerical_columns = [
"Gender", "Age", "Education", "Occup", # Shared features
"Gas_Capital", "Gas_Annual", "Gas_Emission", "Gas_Work", # Item-specific features for Gas
"Electric_Capital", "Electric_Annual", "Electric_Emission", "Electric_Work", # Electric
"Heatpump_Capital", "Heatpump_Annual", "Heatpump_Emission", "Heatpump_Work", # Heatpump
"Solid_Capital", "Solid_Annual", "Solid_Emission", "Solid_Work" # Solid fuel
]

Convert these columns to float32

crheating_df[numerical_columns] = crheating_df[numerical_columns].astype("float32")

Convert Choice column to integer (if not already)

crheating_df["Choice"] = crheating_df["Choice"].astype("int32")

Recreate the ChoiceDataset with correctly typed columns

dataset = ChoiceDataset.from_single_wide_df(
df=crheating_df,
items_id=["Gas", "Electric", "Heatpump", "Solid"], # Names of the alternatives
choices_column="Choice", # Column representing user choices
choice_format="items_index", # Encoding format for choices
shared_features_columns=["Gender", "Age", "Education", "Occup"], # Shared features across choices
items_features_suffixes=["Capital", "Annual", "Emission", "Work"], # Attributes for each item
available_items_suffix="Av", # Suffix for availability indicators
delimiter="_", # Delimiter used in column names
)

Define and fit the model

from choice_learn.models.latent_class_mnl import LatentClassConditionalLogit

Initialize the model

lc_model_2 = LatentClassConditionalLogit(
n_latent_classes=3, # Number of latent classes
fit_method="mle", # Maximum Likelihood Estimation
optimizer="lbfgs", # Optimizer
epochs=1000, # Number of epochs
lbfgs_tolerance=1e-20 # Tolerance for convergence
)

Add shared coefficients for item-specific features

lc_model_2.add_shared_coefficient(coefficient_name="Capital", feature_name="Capital", items_indexes=[0, 1, 2, 3])
lc_model_2.add_shared_coefficient(coefficient_name="Annual", feature_name="Annual", items_indexes=[0, 1, 2, 3])
lc_model_2.add_shared_coefficient(coefficient_name="Emission", feature_name="Emission", items_indexes=[0, 1, 2, 3])
lc_model_2.add_shared_coefficient(coefficient_name="Work", feature_name="Work", items_indexes=[0, 1, 2, 3])

Add shared coefficients for demographic/shared features

lc_model_2.add_shared_coefficient(coefficient_name="Gender", feature_name="Gender", items_indexes=[0, 1, 2, 3])
lc_model_2.add_shared_coefficient(coefficient_name="Age", feature_name="Age", items_indexes=[0, 1, 2, 3])
lc_model_2.add_shared_coefficient(coefficient_name="Education", feature_name="Education", items_indexes=[0, 1, 2, 3])
lc_model_2.add_shared_coefficient(coefficient_name="Occup", feature_name="Occup", items_indexes=[0, 1, 2, 3])

Fit the model to the dataset

hist2 = lc_model_2.fit(dataset, verbose=1)

Print Latent Class Model results

print("Latent Class Model weights:")
print("Classes Logits:", lc_model_2.latent_logits)
for i in range(3): # Assuming 3 latent classes
print("\n")
print(f"Model Nb {i}, weights:", lc_model_2.models[i].trainable_weights)

Evaluate the model's Negative Log-Likelihood (NLL)

nll_2 = lc_model_2.evaluate(dataset) * len(dataset)
print(f"Negative Log-Likelihood: {nll_2}")

Generate structured output as a DataFrame for further analysis

report_data = []
for class_idx, model in enumerate(lc_model_2.models):
class_weights = model.trainable_weights
for weight, feature_name in zip(class_weights, ["Capital", "Annual", "Emission", "Work", "Gender", "Age", "Education", "Occup"]):
coef_estimation = weight.numpy().flatten()[0]
report_data.append({
"Latent Class": class_idx + 1,
"Feature": feature_name,
"Coefficient": coef_estimation
})

Convert the results into a DataFrame

report_df = pd.DataFrame(report_data)

Save the report to a file

output_path = r"C:\Users\mohamma11\Downloads\latent_class_conditional_logit_report.csv"
report_df.to_csv(output_path, index=False)
print(f"Report saved to {output_path}")

''

code for assining ID to classes.

''

Predict the probabilities for each individual for each class

individual_probabilities = lc_model_2.predict_probas(dataset).numpy() # Ensure this is a 2D array

Verify the shape of the probabilities array

print(f"Shape of individual_probabilities: {individual_probabilities.shape}")

Assign each individual to the class with the highest probability

assigned_classes = np.argmax(individual_probabilities, axis=1)

Create a DataFrame to store the results

individual_ids = crheating_df["ID"].values # Assuming the dataset has an "ID" column
results_df = pd.DataFrame({
"ID": individual_ids,
"Assigned Class": assigned_classes + 1, # Adding 1 to match class numbering (1-based)
})

Include probabilities for each class

for i in range(individual_probabilities.shape[1]):
results_df[f"Class {i + 1} Probability"] = individual_probabilities[:, i] ''

''
individual_level_class_assignments (1).csv

VincentAuriau · 2024-12-02T10:50:06Z

VincentAuriau
Dec 2, 2024
Maintainer

Hello @asghar13,

Just to be sure I have understood your question, what you need is the probability of each sample to belong to one of the latent classes ?
In the current version, the probability of each class is learned as a fixed value (all sample have the same probability to belong to each class).

You can get the estimated value with the .get_latent_classes_weights() method that you can find here.

We are planning to add the possibility to estimate this probability from features, but it is not implemented yet.
If that's what you need, it should be possible with a custom model. I'd be happy to help you with this if needed.

4 replies

asghar13 Dec 2, 2024
Author

many thanks, and
Yes, I have used the latent_class_weights method and obtained the following output:

Latent class weights (probabilities):
tf.Tensor([0.33680958, 0.33687294, 0.32631746], shape=(3,), dtype=float32)

It's somewhat surprising that all the classes have nearly equal weights.

However, the output I am looking for is to determine which class each individual ultimately belongs to. For instance, consider an individual with ID Cr00001 in my dataset, who makes choices across 8 different choice cards (choice tasks). During our computational process, we assign each of this individual's choices to a specific class. Therefore, we end up with 8 rows for individual Cr00001, indicating which class each of their choices is assigned to, similar to the image Assign_cards_to_classes.PNG.

The issue is that, in the output of my dataset, all 8 choices for every individual, from Cr00001 to Cr01000, are assigned to class 3.

However, this is not the output I need. I need to aggregate the 8 choices for each individual and determine the probability of assigning the individual to each class. Similarly, for every individual in my dataset, it should indicate the probability of them being in each class. I currently have this output, as shown in the image Assign_ID_to_classes.PNG. However, there are two issues:

All individuals are assigned to the same class (class 3), whereas they should be more evenly distributed across different classes based on the weights:

tf.Tensor([0.33680958, 0.33687294, 0.32631746], shape=(3,), dtype=float32)

The calculated probability for all individuals is identical. In other words, the probability that individual Cr00001 falls into classes 1 to 3 is exactly the same as the probability that individual Cr00999 falls into each of these classes.

Best

asghar13 Dec 3, 2024
Author

Hi, I hope you are well,
and many thanks for your time and efforts.
this part is a significant part of my project, in which after latent class analysis I can recognize which individuals belong to which class. in fact my question is, do I use the right comment for ''assigned'' purpose?
Best

VincentAuriau Dec 3, 2024
Maintainer

Hello,
Sorry for the late answer, I thought it was the same subject as the other discussion.
For the latent class weights that are similar, what I could recommend is to make sure there are enough epochs and that the estimation optimization has reached an optimum.

As an introductive question, why do you have 3 classes in your model weights and 4 classes in your output ? They are both the number of "latent classes", right ?

Here a few explanations:

latent class weights are the probabilities for a (new) unknown individual to belong to each of the latent classes (which is the same for everyone since it does not depend on any feature here), let's note it $\mathbb{P}(l)$ with $l$ a latent class.
predict_probas return the alternative probabilities: $\mathbb{P}(i) = \sum_{l} \mathbb{P}(l) \cdot \mathbb{P}(i | l)$ with i the alternative
Maybe what you want is to assign each individual to the class that better predicts it 8 choices ?
What about something like:

predicted_probas = [mnl.predict_probas(choice_dataset) for mnl in models.models]
        latent_probabilities = model.get_latent_classes_weights()

        predicted_probas = [
            latent
            * tf.gather_nd(
                params=proba,
                indices=tf.stack(
                    [tf.range(0, len(choice_dataset), 1), choice_dataset.choices], axis=1
                ),
            )
            for latent, proba in zip(latent_probabilities, predicted_probas)
        ]
        predicted_probas = np.stack(predicted_probas, axis=1)

and then predicted_probas will be the for each latent class its predicted probability of the "effective" choice. Aggregating over individuals and taking argmax should give you the class attribution you are looking for ?

asghar13 Dec 3, 2024
Author

it works very well, very thanks for your great helps

VincentAuriau · 2024-12-23T09:29:50Z

VincentAuriau
Dec 23, 2024
Maintainer

Hello,
I have integrated your suggestions in the /main branch.
Thanks again for reaching out.
If the package has helped you, consider citing us and starring the repository :)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Class Assignment Output for Each ID after LatentClassConditionalLogit #193

{{title}}

Replies: 2 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Class Assignment Output for Each ID after LatentClassConditionalLogit #193

asghar13 Dec 1, 2024

Define the columns to retain for the analysis

Filter the dataset to include only the defined columns

Display a preview of the filtered dataset

Display the unique values in the 'Choice' column (to verify mapping)

Map categorical choices to integers for modeling

Ensure the 'Choice' column is of integer type

Convert the dataset into the required format for choice modeling

Ensure all numerical columns are explicitly converted to float32

Convert these columns to float32

Convert Choice column to integer (if not already)

Recreate the ChoiceDataset with correctly typed columns

Define and fit the model

Initialize the model

Add shared coefficients for item-specific features

Add shared coefficients for demographic/shared features

Fit the model to the dataset

Print Latent Class Model results

Evaluate the model's Negative Log-Likelihood (NLL)

Generate structured output as a DataFrame for further analysis

Convert the results into a DataFrame

Save the report to a file

Predict the probabilities for each individual for each class

Verify the shape of the probabilities array

Assign each individual to the class with the highest probability

Create a DataFrame to store the results

Include probabilities for each class

Replies: 2 comments · 4 replies

VincentAuriau Dec 2, 2024 Maintainer

asghar13 Dec 2, 2024 Author

asghar13 Dec 3, 2024 Author

VincentAuriau Dec 3, 2024 Maintainer

asghar13 Dec 3, 2024 Author

VincentAuriau Dec 23, 2024 Maintainer

asghar13
Dec 1, 2024

Replies: 2 comments 4 replies

VincentAuriau
Dec 2, 2024
Maintainer

asghar13 Dec 2, 2024
Author

asghar13 Dec 3, 2024
Author

VincentAuriau Dec 3, 2024
Maintainer

asghar13 Dec 3, 2024
Author

VincentAuriau
Dec 23, 2024
Maintainer