Framework Evaluation (48 Models)
Expand sections to see detailed subsection scores for each model
| Section / Subsection | Gemini 3 ProGoogle | Grok 4xAI | GPT-5OpenAI | Qwen3-MaxAlibaba | Llama 4 MaverickMeta AI | Gemini 2.5 ProGoogle | Gemini 1.5 ProGoogle | Grok 3xAI |
|---|---|---|---|---|---|---|---|---|
Model Details15% Weight | 100.0% | 90.0% | 93.3% | 93.3% | 83.3% | 100.0% | 80.0% | 90.0% |
Model Inputs & Outputs6% Weight | 100.0% | 100.0% | 100.0% | 100.0% | 100.0% | 100.0% | 100.0% | 100.0% |
Model Data15% Weight | 86.7% | 100.0% | 86.7% | 86.7% | 60.0% | 86.7% | 86.7% | 60.0% |
Model Implementation and Sustainability5% Weight | 100.0% | 40.0% | 40.0% | 40.0% | 100.0% | 100.0% | 100.0% | 40.0% |
Intended Use10% Weight | 100.0% | 100.0% | 70.0% | 70.0% | 100.0% | 50.0% | 70.0% | 50.0% |
Critical Risk20% Weight | 70.0% | 90.0% | 80.0% | 55.0% | 80.0% | 45.0% | 70.0% | 100.0% |
Safety Evaluation25% Weight | 76.0% | 100.0% | 100.0% | 16.0% | 60.0% | 44.0% | 72.0% | 64.0% |
Risk Mitigations4% Weight | 100.0% | 100.0% | 100.0% | 100.0% | 100.0% | 100.0% | 100.0% | 100.0% |
Framework & Scoring Methodology
Understanding how we evaluate model cards and calculate scores
Framework Structure
Our evaluation framework consists of 8 main sections, each with a specific weight percentage that reflects its importance in model card quality assessment.
How Scoring Works
1. Subsection Scoring (Boolean Logic)
Each subsection has a maximum score (e.g., "Model Architecture" = 4 points). Scoring uses boolean logic: we evaluate whether the subsection information is present in the model card:
- If information is present: Boolean score = 1 → Model gets the full maximum score (e.g., 4/4)
- If information is absent: Boolean score = 0 → Model gets 0 points (e.g., 0/4)
Example: If "Model Architecture" (max: 4) is documented, boolean score = 1, so the model gets 4 points (4/4). If not documented, boolean score = 0, so the model gets 0 points (0/4). There are no partial scores—it's all or nothing.
2. Section Percentage Calculation
For each section, we calculate the percentage based on how many subsection points the model earned using boolean logic:
Example: "Model Details" section has subsections totaling 15 points max. If a model has boolean score = 1 for subsections worth 3, 4, 2, and 3 points (total 12), the section percentage is (12 ÷ 15) × 100 = 80%.
3. Section Weights
Each section has a weight percentage that indicates its relative importance in the overall framework. The weights sum to 100% and reflect the framework's priorities:
Complete Example Calculation
Let's calculate the "Model Details" section (15% weight) for a hypothetical model:
Note: Each subsection uses boolean scoring (1 = full points, 0 = no points)
Color Coding
Excellent coverage
Moderate coverage
Limited coverage