Machine learning, installing and using the libtorch library for house price prediction

TensorFlow and PyTorch are both mainstream industrial-grade machine learning Python libraries, and their underlying code is written in C and C++. I tried using their C and C++ libraries.

I use PyTorch's C++ library, libtorch, for machine learning training

TensorFlow also has a C language library, but it is designed for using models trained in Python. It's difficult to use it for training.

So libtorch is easier to use than TensorFlow's C library because libtorch can fully use all of PyTorch's features.

You can download libtorch from this address:https://pytorch.org/get-started/locally/

linux cpu 2.10.0 version direct download link is:https://download.pytorch.org/libtorch/cpu/libtorch-shared-with-deps-2.10.0%2Bcpu.zip

After the download is complete, unzip it. On a Linux system, you need to install g++. I guess it needs to be at least version 10 or higher; my version is g++ 13.3.0

Then create a cpp code file

a.cpp


#include <torch/torch.h>
#include <iostream>

// 9 -> 64 -> 32 -> 1
struct Net : torch::nn::Module {
    torch::nn::Linear l1{nullptr}, l2{nullptr}, l3{nullptr};

    Net() {
        l1 = register_module("l1", torch::nn::Linear(8, 64));
        l2 = register_module("l2", torch::nn::Linear(64, 32));
        l3 = register_module("l3", torch::nn::Linear(32, 1));
    }

    torch::Tensor forward(torch::Tensor x) {
        x = torch::relu(l1(x));
        x = torch::relu(l2(x));
        x = l3(x);
        return x;
    }
};

int main() {
    // ======================
    // 1. Prepare training data
    // ======================

    // columns:
    // longitude,latitude,housing_age,total_rooms,total_bedrooms,
    // population,households,median_income
    torch::Tensor X = torch::tensor({
        {-122.23f, 37.88f, 41.0f,  880.0f, 129.0f,  322.0f, 126.0f, 8.3252f},
        {-122.22f, 37.86f, 21.0f, 7099.0f,1106.0f,2401.0f,1138.0f, 8.3014f},
        {-118.00f, 34.00f, 30.0f, 3000.0f, 500.0f,1500.0f, 600.0f, 4.0000f},
        {-121.00f, 38.00f, 15.0f, 2000.0f, 300.0f, 800.0f, 250.0f, 3.0000f,}
    }, torch::kFloat);


    // Housing prices
    torch::Tensor Y = torch::tensor({
        {452600.0f},
        {358500.0f},
        {150000.0f},
        {120000.0f}
    }, torch::kFloat);


    // ======================
    // 2. Create Network
    // ======================
    Net net;
    net.train();

    torch::optim::Adam optimizer(net.parameters(), 0.001);

    // ======================
    // 3. Training
    // ======================
    for (int epoch = 0; epoch < 5000; epoch++) {
        optimizer.zero_grad();

        auto pred = net.forward(X);
        auto loss = torch::mse_loss(pred, Y);

        loss.backward();
        optimizer.step();

        if (epoch % 500 == 0) {
            std::cout << "Epoch " << epoch
                      << "  Loss = " << loss.item<float>() << std::endl;
        }
    }

    // ======================
    // 4. Prediction
    // ======================
    net.eval();

    torch::Tensor test = torch::tensor({
        {-122.25f, 37.87f, 30.0f, 2000.0f, 300.0f, 800.0f, 250.0f, 7.0f}
    }, torch::kFloat);

    auto out = net.forward(test);

    std::cout << "\nPredicted house price = "
              << out.item<float>() << std::endl;

    return 0;
}

Compile it,add your own library header files and library directories, and link to the dynamic library file

g++ a.cpp  -I/home/uu1/Documents/libtorch/include -I/home/uu1/Documents/libtorch/include/torch/csrc/api/include -L/home/uu1/Documents/libtorch/lib -ltorch -ltorch_cpu -lc10 -lpthread

The program requires the support of the libtorch dynamic library to run.

The Linux dynamic linker (ld.so) by default searches for dynamic library files in certain system directories, and it doesn’t know where the libtorch dynamic library files are located.

So I need to temporarily set LD_LIBRARY_PATH so that it can find the libtorch dynamic library files.

export LD_LIBRARY_PATH=/home/uu1/Documents/libtorch/lib:$LD_LIBRARY_PATH

After that, I added the x attribute to the program file and then ran it.

chmod +x a.out
./a.out

output:

Epoch 0  Loss = 9.2591e+10
Epoch 500  Loss = 3.97314e+10
Epoch 1000  Loss = 3.79781e+10
Epoch 1500  Loss = 3.27318e+10
Epoch 2000  Loss = 1.41229e+10
Epoch 2500  Loss = 2.126e+09
Epoch 3000  Loss = 2.53379e+08
Epoch 3500  Loss = 5.67402e+07
Epoch 4000  Loss = 1.54294e+07
Epoch 4500  Loss = 3.14821e+06

Predicted house price = 182347

The result is extremely overfitted, but numerically it is correct.

It seems that the loss value of 3.14821e+06 is very large.But it is calculated using mean squared error.

error^2 = 3.14821e+06
error = 1775

The scale of house prices is 120,000 – 450,000, so the error is only 0.6%

1800 / 300,000 ≈ 0.6%

I found a California housing price dataset, and it can be downloaded here:https://www.kaggle.com/api/v1/datasets/download/camnugent/california-housing-prices

i downloaded it:archive.zip

It is a csv file with more than 20,000 pieces of data, and its data looks like this.

longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value,ocean_proximity
-122.23,37.88,41.0,880.0,129.0,322.0,126.0,8.3252,452600.0,NEAR BAY
-122.22,37.86,21.0,7099.0,1106.0,2401.0,1138.0,8.3014,358500.0,NEAR BAY
-122.24,37.85,52.0,1467.0,190.0,496.0,177.0,7.2574,352100.0,NEAR BAY
-122.25,37.85,52.0,1274.0,235.0,558.0,219.0,5.6431,341300.0,NEAR BAY
-122.25,37.85,52.0,1627.0,280.0,565.0,259.0,3.8462,342200.0,NEAR BAY
-122.25,37.85,52.0,919.0,213.0,413.0,193.0,4.0368,269700.0,NEAR BAY
-122.25,37.84,52.0,2535.0,489.0,1094.0,514.0,3.6591,299200.0,NEAR BAY
-122.25,37.84,52.0,3104.0,687.0,1157.0,647.0,3.12,241400.0,NEAR BAY
-122.26,37.84,42.0,2555.0,665.0,1206.0,595.0,2.0804,226700.0,NEAR BAY

This is the new code. It loads housing price data from the CSV file and removes the last column, ocean_proximity, to predict housing prices.


#include <torch/torch.h>
#include <iostream>
#include <fstream>
#include <sstream>
#include <vector>
#include <string>

// =======================
// Neural Network
// 8 -> 64 -> 32 -> 1
// =======================
struct Net : torch::nn::Module {
    torch::nn::Linear l1{nullptr}, l2{nullptr}, l3{nullptr};

    Net() {
        l1 = register_module("l1", torch::nn::Linear(8, 64));
        l2 = register_module("l2", torch::nn::Linear(64, 32));
        l3 = register_module("l3", torch::nn::Linear(32, 1));
    }

    torch::Tensor forward(torch::Tensor x) {
        x = torch::relu(l1(x));
        x = torch::relu(l2(x));
        x = l3(x);
        return x;
    }
};

// =======================
// CSV Loader + Standardization
// =======================
void load_and_standardize(const std::string& filename,
                          torch::Tensor& X,
                          torch::Tensor& Y,
                          torch::Tensor& X_mean,
                          torch::Tensor& X_std,
                          torch::Tensor& Y_mean,
                          torch::Tensor& Y_std)
{
    std::ifstream file(filename);
    if (!file.is_open()) throw std::runtime_error("Cannot open CSV");

    std::string line;
    std::getline(file, line); // skip header

    std::vector<float> x_data;
    std::vector<float> y_data;

    int rows = 0;
    while (std::getline(file, line) && rows < 3000) {

        float longitude, latitude, housing_age;
        float total_rooms, total_bedrooms, population;
        float households, median_income;
        float median_house_value;
        char ocean[512];

        int n = std::sscanf(
            line.c_str(),
            "%f,%f,%f,%f,%f,%f,%f,%f,%f,%512[^\n]",
            &longitude,
            &latitude,
            &housing_age,
            &total_rooms,
            &total_bedrooms,
            &population,
            &households,
            &median_income,
            &median_house_value,
            ocean
        );

        if (n < 9) continue;

        x_data.push_back(longitude);
        x_data.push_back(latitude);
        x_data.push_back(housing_age);
        x_data.push_back(total_rooms);
        x_data.push_back(total_bedrooms);
        x_data.push_back(population);
        x_data.push_back(households);
        x_data.push_back(median_income);

        y_data.push_back(median_house_value);

        rows++;
    }

    X = torch::from_blob(x_data.data(), {rows, 8}, torch::kFloat).clone();
    Y = torch::from_blob(y_data.data(), {rows, 1}, torch::kFloat).clone();

    // Compute mean and std
    X_mean = X.mean(0, false);
    X_std  = X.std(0, false);
    Y_mean = Y.mean();
    Y_std  = Y.std();

    // Standardize
    X = (X - X_mean) / X_std;
    Y = (Y - Y_mean) / Y_std;

    std::cout << "Loaded and standardized rows: " << rows << std::endl;
}

// =======================
// Main
// =======================
int main() {

    torch::Tensor X, Y, X_mean, X_std, Y_mean, Y_std;

    try {
        load_and_standardize("housing.csv", X, Y, X_mean, X_std, Y_mean, Y_std);
    } catch (const std::exception& e) {
        std::cerr << "CSV error: " << e.what() << std::endl;
        return -1;
    }

    // ======================
    // Create network
    // ======================
    Net net;
    net.train();

    torch::optim::Adam optimizer(net.parameters(), 0.001);

    // ======================
    // Training
    // ======================
    for (int epoch = 0; epoch < 6000; epoch++) {
        optimizer.zero_grad();
        auto pred = net.forward(X);
        auto loss = torch::mse_loss(pred, Y);
        loss.backward();
        optimizer.step();

        if (epoch % 100 == 0)
            std::cout << "Epoch " << epoch << "  Loss = " << loss.item<float>() << std::endl;
    }

    // ======================
    // Prediction
    // ======================
    net.eval();

    torch::Tensor test = torch::tensor({
        {-118.22f, 34.67f, 28.0f, 2357.0f, 408.0f, 1162.0f, 384.0f, 4.3636f}
    }, torch::kFloat);

    // Standardize test
    test = (test - X_mean) / X_std;

    auto out_std = net.forward(test);

    // Convert back to real price
    auto out = out_std * Y_std + Y_mean;

    std::cout << "\nPredicted house price = " << out.item<float>() << std::endl;

    return 0;
}

It used the first 3,000 lines for training, and the test data is line 9,090.

-118.22,34.67,28.0,2357.0,408.0,1162.0,384.0,4.3636,179700.0,INLAND

output


Epoch 5300  Loss = 0.0656063
Epoch 5400  Loss = 0.0652267
Epoch 5500  Loss = 0.0653379
Epoch 5600  Loss = 0.064889
Epoch 5700  Loss = 0.0643134
Epoch 5800  Loss = 0.063881
Epoch 5900  Loss = 0.0634381

Predicted house price = 84482.7

Obviously, its prediction failed. I don't know why; I'm not a machine learning expert, I'm just playing around.