Tag Archives: Machine Learning

Implementing Vector Embeddings and Semantic Search in Pure Java

Every modern AI search system — from Google to ChatGPT’s retrieval pipeline — works by converting text into numerical vectors and measuring how close those vectors are in high-dimensional space. This technique is called semantic search, and the numerical representations are called vector embeddings. Despite being the backbone of Retrieval-Augmented Generation (RAG), recommendation engines, and intelligent search, virtually every tutorial on the internet implements it in Python. Java developers are left guessing.

This post builds a complete semantic search engine in pure Java — no LangChain4j, no Spring AI, no external dependencies. We implement TF-IDF vectorisation, cosine similarity, and a query engine that ranks documents by meaning rather than keyword matching. By the end, you will understand the exact mathematics that powers every vector database on the market.

Continue reading Implementing Vector Embeddings and Semantic Search in Pure Java

Building a Neural Network from Scratch in Pure Java (No Libraries)

Neural networks power everything from image recognition to language models, yet most tutorials use Python and hide the mathematics behind library calls. If you are a Java developer, building a neural network from raw arithmetic — no TensorFlow, no DL4J, no dependencies at all — is the single best way to internalise how learning actually works at the weight-and-gradient level.

This post implements a fully connected, multi-layer feedforward neural network in pure Java. The network learns the XOR function, a classic problem that a single-layer perceptron cannot solve, which is exactly why it is the standard benchmark for testing that backpropagation is implemented correctly. Every line is annotated with the mathematics driving it.

Continue reading Building a Neural Network from Scratch in Pure Java (No Libraries)