Atlas

Framework Evaluation (48 Models)

Expand sections to see detailed subsection scores for each model

Showing 1-8 of 48

Section / Subsection	Gemini 3 ProGoogle	Grok 4xAI	GPT-5OpenAI	Qwen3-MaxAlibaba	Llama 4 MaverickMeta AI	Gemini 2.5 ProGoogle	Gemini 1.5 ProGoogle	Grok 3xAI
Model Details15% Weight	100.0%	90.0%	93.3%	93.3%	83.3%	100.0%	80.0%	90.0%
Model Inputs & Outputs6% Weight	100.0%	100.0%	100.0%	100.0%	100.0%	100.0%	100.0%	100.0%
Model Data15% Weight	86.7%	100.0%	86.7%	86.7%	60.0%	86.7%	86.7%	60.0%
Model Implementation and Sustainability5% Weight	100.0%	40.0%	40.0%	40.0%	100.0%	100.0%	100.0%	40.0%
Intended Use10% Weight	100.0%	100.0%	70.0%	70.0%	100.0%	50.0%	70.0%	50.0%
Critical Risk20% Weight	70.0%	90.0%	80.0%	55.0%	80.0%	45.0%	70.0%	100.0%
Safety Evaluation25% Weight	76.0%	100.0%	100.0%	16.0%	60.0%	44.0%	72.0%	64.0%
Risk Mitigations4% Weight	100.0%	100.0%	100.0%	100.0%	100.0%	100.0%	100.0%	100.0%

Page 1 of 6

Framework & Scoring Methodology

Understanding how we evaluate model cards and calculate scores

Framework Structure

Our evaluation framework consists of 8 main sections, each with a specific weight percentage that reflects its importance in model card quality assessment.

Model Details

15% Weight9 subsections

Model Inputs & Outputs

6% Weight3 subsections

Model Data

15% Weight3 subsections

Model Implementation and Sustainability

5% Weight3 subsections

Intended Use

10% Weight3 subsections

Critical Risk

20% Weight5 subsections

Safety Evaluation

25% Weight9 subsections

Risk Mitigations

4% Weight1 subsections

How Scoring Works

1. Subsection Scoring (Boolean Logic)

Each subsection has a maximum score (e.g., "Model Architecture" = 4 points). Scoring uses boolean logic: we evaluate whether the subsection information is present in the model card:

If information is present: Boolean score = 1 → Model gets the full maximum score (e.g., 4/4)
If information is absent: Boolean score = 0 → Model gets 0 points (e.g., 0/4)

Example: If "Model Architecture" (max: 4) is documented, boolean score = 1, so the model gets 4 points (4/4). If not documented, boolean score = 0, so the model gets 0 points (0/4). There are no partial scores—it's all or nothing.

2. Section Percentage Calculation

For each section, we calculate the percentage based on how many subsection points the model earned using boolean logic:

For each subsection: If boolean score = 1, add subsection max score; if 0, add 0

Section Score = Σ (Subsection Max Score) for all subsections with boolean score = 1

Section % = (Section Score ÷ Total Possible Section Score) × 100

Example: "Model Details" section has subsections totaling 15 points max. If a model has boolean score = 1 for subsections worth 3, 4, 2, and 3 points (total 12), the section percentage is (12 ÷ 15) × 100 = 80%.

3. Section Weights

Each section has a weight percentage that indicates its relative importance in the overall framework. The weights sum to 100% and reflect the framework's priorities:

Safety Evaluation

25%

Critical Risk

20%

Model Details

15%

Complete Example Calculation

Let's calculate the "Model Details" section (15% weight) for a hypothetical model:

Model Overview (max: 3)3/3 ✓

Model Architecture (max: 4)4/4 ✓

Model Version (max: 2)0/2 ✗

... (other subsections)...

Note: Each subsection uses boolean scoring (1 = full points, 0 = no points)

Model Overview: boolean = 1 → 3 points

Model Architecture: boolean = 1 → 4 points

Model Version: boolean = 0 → 0 points

Total Score:12 / 15 points

Section Percentage:80%

Color Coding

80-100%

Excellent coverage

50-79%

Moderate coverage

0-49%

Limited coverage