About the service

ScreenGap is built for fast composition-only bandgap screening.

ScreenGap provides a lightweight workflow for screening materials data from simple CSV uploads. The current production path is meant for triage: quickly find formulas worth deeper structure-aware analysis, then validate finalists with experiment, literature, DFT, or a phase-specific model.

Starter model

Matbench-backed

The starter model is the deployed screen_matbench_model path.

JARVIS model

60/40 test split

The checked JARVIS max-12 run uses 11,836 training rows and 7,892 test rows for MBJ bandgaps.

Scope

0-12 eV target range

JARVIS rows with electronic bandgaps above 12 eV are deliberately excluded from that model run.

Benchmarks

How the current models stack up

The deployed Matbench starter model is documented with an honest 80/20 holdout. That is the verified split in the current public model artifact. The JARVIS research worker run shown here uses the curated max-12 data files with a 60/40 train/test split.

Matbench R^2

0.603

80/20 holdout

Matbench MAE

0.465 eV

lower is better

JARVIS MBJ R^2

0.758

60/40 holdout

JARVIS MBJ MAE

0.606 eV

max target 12 eV

Matbench holdout confusion matrix for metal, semiconductor, and insulator classes
Matbench holdout confusion matrix. The starter model is tuned to keep semiconductor recall high while reducing false semiconductor calls for materials that should be classified as metals.
Matbench model tradeoff plot showing metal false-semiconductor reduction
Matbench tradeoff plot. The deployed metal override reduced metal-to-semiconductor false calls from 232 to 90 in the holdout set, a 61.2% reduction.

JARVIS max-12 snapshot

The checked JARVIS curation removes invalid formulas, noble-gas rows, missing MBJ targets, and MBJ electronic bandgaps above 12 eV. For MBJ, 19,728 rows remain after curation: 11,836 train and 7,892 test.

MBJ regression R^20.758
MBJ class accuracy0.799
Semiconductor recall0.734

What this means

JARVIS gives ScreenGap a useful second lens because it is not the same data source as Matbench. The max-12 filter keeps the model focused on the electronic bandgap range most relevant to practical screening and avoids letting rare extreme-gap records dominate the fit.

Interpretability

ScreenGap is composition-only. The model does not require crystal structures, and its reporting is organized around formula-derived signals such as element mix, elemental-property aggregates, stoichiometry patterns, and composition interactions. Future reports can expose per-formula feature contributions so users can see which composition signals pushed a prediction higher or lower.

Polymorphs

Different structures can share the same formula and still have different bandgaps. ScreenGap does not choose a specific polymorph or phase from composition alone. Its result should be read as a composition-level screening estimate learned from the training distribution, not as a phase-resolved bandgap guarantee. When polymorphs are central to a decision, users should follow up with structure-aware methods or phase-specific calculations.

Data Use

Uploaded formula lists are processed for requested jobs and service operations. They are not used to train shared models without written permission. Contact support before submitting regulated, export-controlled, or highly confidential data.

Contact

For support, reply to the email you received during signup.

Status

This is an MVP. Expect improvements to model quality, speed, and reporting over time.