Hoxton Farms Open-Sources AI Models for Biomanufacturing: BrightQuant and LipiQuantHoxton Farms Open-Sources AI Models for Biomanufacturing: BrightQuant and LipiQuantHoxton Farms Open-Sources AI Models for Biomanufacturing: BrightQuant and LipiQuantHoxton Farms Open-Sources AI Models for Biomanufacturing: BrightQuant and LipiQuantHoxton Farms Open-Sources AI Models for Biomanufacturing: BrightQuant and LipiQuantHoxton Farms Open-Sources AI Models for Biomanufacturing: BrightQuant and LipiQuant

Meet Hoxton Farms’ latest tools for AI image analysis: BrightQuant and LipiQuant

High-content imaging promises a wealth of data for cell biologists but many image analysis pipelines remain highly subjective, unreliable and slow. At Hoxton Farms, we've developed AI computer vision models to accelerate analysis and decision-making using microscopy datasets.

Today, we are releasing three computer vision models: BrightQuant (label-free cell counting from brightfield images), LipiQuant (lipid droplet segmentation for adherent and reattached cells) and a nucleus detection model (a segmentation model for nucleus-stained images).

LipiQuant: Segmentation of lipid droplets

LipiQuant is a segmentation model trained to detect intracellular lipid droplets. It uses the StarDist architecture, which represents objects not by predicting a pixel-wise mask but by estimating the object’s radius along a set of radial directions. This representation works well for approximately convex objects and enables robust droplet detection in challenging brightfield conditions. Here is an example of the model’s output on human adipocytes (image courtesy of Dr Paul Cohen and Dr Luke Olsen, The Rockefeller University).

Once droplets are detected, we can quantify lipid accumulation, which we use as a reliable proxy for the progression of differentiation. In 2D images droplets appear circular, but assuming they are approximately spherical allows us to estimate their volume from the segmented area. As a straightforward metric of adipogenic differentiation, we calculate the sum of droplet areas or volumes within an image or well.

Crucially, because these metrics are derived from stain-free imaging, we can monitor cultures continuously without interacting with or disturbing the cells. The example below shows how droplet areas change over time in one of our media screens.

Each line represents a single well (e.g. E05) in a multi-well plate along a timecourse. The value on the y axis is the sum of the areas of all detected droplets at that time point, averaged across the fields of view collected for that well.

With multiple time points and per-droplet information, we can start asking questions that would not be possible to answer with, say, a fluorometric endpoint assay. For example, we can measure how quickly droplets accumulate at different phases of differentiation, or compare conditions using the full distribution of droplet sizes rather than only the total lipid content. Are we seeing many small droplets or a few large ones? Does the population behave uniformly or are there subpopulations that differentiate more or less readily?

At the end of culture we can go further by staining the nuclei with Hoechst, which lets us locate each individual cell precisely. Using our nucleus detection model, we assign each droplet to its nearest nucleus, giving us per cell lipid metrics and a direct view of heterogeneity in differentiation. As a simple illustration, per-cell LipiQuant results can hint at whether a culture is genetically uniform or whether distinct subpopulations respond differently to the same differentiation medium.

BrightQuant: Label-Free Cell Counting

Understanding how quickly cells proliferate under different conditions is essential for any scalable cell culture process. Reliable cell counts over time are a core part of that, and while our nucleus detection model can provide these counts, staining has limitations. Even relatively benign dyes such as Hoechst can influence growth if cells are exposed for extended periods, and a growth experiment would require either sacrificial wells or end-point-only staining. Staining also adds procedural steps that increase hands-on time for our wet-lab team.

A brightfield-based counting method would avoid these issues — but brightfield cell counting is non-trivial. Human annotators often struggle to label all cells accurately, particularly at 4X magnification. At higher magnifications, cells are easier to see but imaging time increases substantially. Even then, cells overlap at high cell density and can exhibit a wide range of morphologies in different states and conditions. Together, these factors make classic segmentation-based approaches unreliable or too labour-intensive to train.

Our solution relies on two key insights:

  1. We do not need exact segmentation masks, only accurate counts.

    Instead of predicting individual cell boundaries, we predict a cell density map whose integral corresponds to the number of cells present.

  2. We can use our nucleus detection model to generate the ground truth.

    For each detected nucleus in a Hoechst-stained image, we place a Gaussian blob on a target density map such that the total sum contributed by each nucleus is 1. The BrightQuant model is then trained to generate this density map directly from the corresponding brightfield image. A well-trained model produces an output whose pixel sum is a good estimator of the true cell count.

Below you can see four images that illustrate the idea:

The left-hand side shows the imaged cells, Hoechst-stained (top) and in brightfield (bottom). During training, we first run the nucleus detection model on the top image to produce the image on the top-right. With the additional processing described above, this becomes the ground truth. We then run BrightQuant on the brightfield image to produce the image on the bottom-right. The two are then compared and the model updated to more closely align with the ground truth.

We chose the size of the Gaussian target to roughly match the diameter of a typical cell. Because the model is trained with a pixel-wise loss (RMSE), using a broader Gaussian makes the loss surface smoother: the model is rewarded not only for predicting the exact centre of a cell but also for getting close. If the Gaussian were too narrow, the loss would barely change when the model placed a predicted blob a few pixels away from the true location, leading to very small gradients and slower learning.

We made several other design choices to help the model generalise well. Microscopy images do not have a preferred orientation or position (there is no "right-way-up") so the model should not assume one. To enforce this, we augmented the training data by applying random rotations and crops to both the input images and their ground-truth targets. This encouraged the desired rotational and translational symmetry and effectively increased the size of the dataset.

For the model architecture, we used a U-Net which is widely used in microscopy applications, is relatively small and fast to train, and, because it is fully convolutional, naturally respects translational symmetry.

Finally, we trained on a large dataset: over 2 TB of images. To keep the GPUs fully utilised, we trained on AWS SageMaker with data stored on an FSx for Lustre file system, which provides fast streaming access. With a mounted filesystem and in-memory caching we ensured that data loading did not become a bottleneck during training.

Performance

As with LipiQuant, BrightQuant lets us monitor cultures over time without interfering with the cell culture. In growth experiments, it enables us to separate growth rate from maximum density and understand which variables influence each. And because the model predicts a smooth density field rather than individual cell instances, challenges such as overlapping cells or low magnification are no longer significant.

We have validated BrightQuant on a variety of cell types, morphologies and vessel geometries. Across the board on a range of densities up to 1e5 cells / cm^2, independent of plate type or cell line, the error rate is roughly 5e3 cells / cm^2.

A scatter plot showing the estimated cell count, with the ground truth on the y-axis and BrightQuant output on the x-axis. Each data point represents a single well, obtained by averaging the per-field-of-view output of the nucleus detection model and BrightQuant. The conversion to cell density is done by scaling by the size of a single field of view.

When faced with data outside its training distribution, BrightQuant will predict a number proportional to cell count, rather than the cell count itself. What this means is that you can calibrate the outputs, as you would with any other assay. At Hoxton Farms we've found that staining cells at the endpoint of an experiment is a cheap way to get accurate cell counts without having to retrain the model frequently.


We built these models for our own pipelines, but the underlying problems are widely shared across cell biology. They’ll matter even more as biomanufacturing scales: counting cells reliably, quantifying intracellular lipid accumulation, and linking image-derived features back to individual cells. We’re sharing these models in the hope they’re useful beyond Hoxton Farms, and we’d genuinely welcome feedback, issues, benchmarks, suggested improvements, or contributions from teams working with similar imaging setups. If you use them, we’d love to hear what you’re measuring and what you’d like to see next.

Accessing the Models

All three models — BrightQuant, LipiQuant and our nucleus detection model are now available at this repository.

The repository includes instructions for installation and basic usage, as well as example scripts for running inference on your own images. We hope these models will be useful for researchers working with similar imaging modalities, and that they support new types of quantitative questions that are otherwise difficult to answer.

If you use these tools or adapt them for your own workflows, we would be interested to hear about your experience at hello@hoxtonfarms.com.