15 KiB
Folksy Generator — Evaluation Report
Date: 2026-02-17 Evaluator: Claude (automated) Scope: Post-integration health check after three LLM augmentation phases
1. Project Structure Overview
folksy-generator/
├── folksy_generator.py # Main CLI generator (910 lines)
├── FOLKSY_GENERATOR_SPEC.md # Original project spec
├── GRAPH_ENHANCEMENT_SPEC.md # LLM graph augmentation spec (Phases 1-3)
├── CORPUS_GENERATION_SPEC.md # Corpus generation spec (next phase)
├── data/
│ ├── folksy_vocab.csv # Curated vocabulary (624 words, expanded from 534)
│ ├── folksy_vocab.csv.bak.* # Pre-expansion backup (534 words)
│ ├── folksy_relations.csv # Original ConceptNet edges (11,096 edges)
│ ├── folksy_relations_augmented.csv # LLM-generated edges (11,220 edges)
│ ├── classified_proverbs.csv # Labeled real proverbs for reference
│ ├── candidate_additions.csv # OOV words suggested by LLM (3,678 unique)
│ └── enhancement_log.csv # Processing log for all 3 phases
├── scripts/
│ ├── extract_from_conceptnet.py # One-time ConceptNet extraction (requires psql)
│ ├── extract_relations.py # Relation extraction helper
│ ├── classify_proverbs.py # Proverb classification
│ ├── expand_vocab.py # Phase: vocab expansion (+90 words)
│ ├── enhance_graph.py # Phase: LLM edge augmentation
│ ├── generate_raw_batch.sh # Bulk generation script
│ ├── polish_corpus.py # LLM polish pipeline
│ ├── filter_corpus.py # Quality filtering
│ ├── format_training_pairs.py # Training pair generation
│ └── compute_corpus_stats.py # Corpus statistics
├── examples/
│ ├── my_world.json # Fictional entity examples (5 entities)
│ └── sample_output.txt # Pre-integration sample output
├── schemas/
│ └── fictional_entities.schema.json
└── corpus/ # Empty — not yet populated
Entry point: python3 folksy_generator.py — no virtual environment, no dependencies beyond Python 3.11 stdlib.
2. What the Three LLM Integration Phases Produced
Git history shows a single initial commit (8c8a058 Initial 'folksy idiom' generator). All three LLM augmentation phases were executed as data-pipeline operations rather than code commits — the results live in data files.
Phase 1: Per-Word Relationship Expansion
- 624 words processed through GLM4-32B
- 10,726 edges generated, 1,155 accepted (10.8% acceptance rate)
- 9,510 edges rejected as OOV (target words not in folksy vocab)
- 61 duplicates filtered
- Filled gaps in
AtLocation,UsedFor,HasA,MadeOf,PartOf,CapableOf,HasPrerequisite,Causes,HasProperty
Phase 2: Cross-Word Relationship Discovery (Bridge Words)
- 148 low-connectivity words targeted
- 6,272 bridge edges accepted
- This phase focused on connecting isolated vocabulary clusters via shared intermediate concepts
Phase 3: Property Enrichment
- 624 words processed for distinctive HasProperty edges
- 3,849 edges generated, 3,788 accepted (98.4% acceptance rate)
- 61 duplicates filtered
- Targeted at improving
false_equivalencetemplate output
Vocab Expansion (via expand_vocab.py)
- Original vocabulary: 534 words
- Current vocabulary: 624 words (+90 words added)
- Added words span all major categories: animal (18), landscape (16), tool (14), material (13), plant (13), structure (8), food (7), and 25 other categories
Combined Data Summary
| Dataset | Count |
|---|---|
| Original ConceptNet edges | 11,096 |
| LLM-augmented edges | 11,220 |
| Total edges (combined) | 22,316 |
| Original vocabulary | 534 |
| Expanded vocabulary | 624 |
| Candidate OOV words (not added) | 3,678 |
3. Term Database Statistics
Vocabulary by Category (36 categories)
| Category | Words | Category | Words | |
|---|---|---|---|---|
| bird | 97 | fish | 16 | |
| animal | 65 | spice | 16 | |
| tool | 56 | fruit | 15 | |
| plant | 43 | mineral | 14 | |
| food | 38 | insect | 14 | |
| material | 36 | structure | 13 | |
| container | 34 | beverage | 9 | |
| instrument | 28 | fabric | 9 | |
| landscape | 27 | tree | 8 | |
| vegetable | 24 | wood | 7 | |
| building | 21 | herb | 7 | |
| metal | 19 | rock | 6 | |
| flower | 19 | water | 6 | |
| vehicle | 18 | furniture | 5 | |
| stone | 17 | clothing | 5 | |
| weapon | 17 | shelter | 5 | |
| — | — | crop, seed, organism, grain | 3-4 each |
Edge Distribution — Original ConceptNet
| Relation | Edges |
|---|---|
| AtLocation | 5,294 |
| UsedFor | 2,481 |
| CapableOf | 1,138 |
| ReceivesAction | 485 |
| HasProperty | 422 |
| HasA | 307 |
| HasPrerequisite | 261 |
| MadeOf | 181 |
| PartOf | 170 |
| Others (6 types) | 257 |
Edge Distribution — LLM Augmented
| Relation | Edges |
|---|---|
| HasProperty | 3,985 |
| HasA | 1,719 |
| PartOf | 1,247 |
| UsedFor | 1,230 |
| MadeOf | 1,217 |
| AtLocation | 1,008 |
| CapableOf | 288 |
| HasPrerequisite | 250 |
| Others (4 types) | 276 |
The augmented edges deliberately fill the gaps in the original ConceptNet data. HasProperty went from 422 to 4,407 total — critical for the false_equivalence template.
4. Sample Generated Output (30 Sayings)
Generated with python3 folksy_generator.py --count 30 using the full augmented graph:
- An scarf ain't nothing but cotton that met some wool.
- The only difference between a hummingbird and a dodo is metabolism.
- An salt ain't nothing but ore that met some crystals.
- Funny how the earthworm never has enough food for itself.
- What's a coop but a kitchen with sound?
- My grandmother used to say, 'spooning the dessert won't bring you eating.'
- Don't take the wheel and then gripe about the hull.
- A bamboo don't come without its water, now does it?
- Nobody's got less salsa than the man who makes the mango.
- That's like eating the sea and complaining the savanna tastes off.
- My daddy always said, can't have waking up in morning without coffee.
- Take the bison out of meat and all you've got left is salty taste flesh.
- Like baiting the flock and hoping for keep as pet.
- The ice's family always goes without cool body.
- There's a fella who takes the wax and says the sugar's no good.
- That's just holding the drawer and praying for store blanket.
- You know what they say, a mica with no schist is just a rough surface rock.
- An silver ain't nothing but hairbrushes that met some alloy.
- A kite is just a pelican that's got catch wind.
- Like making the denim and hoping for material.
- The nut feeds everyone's fit bolt but its own.
- The pitcher's family always goes without throw fast ball.
- A nail is just a weapon that's got smooth length.
- You want lid? Well, first you're gonna need container.
- Don't build the micrometer and say you ain't got workshop.
- Ain't no sleeping at night ever came from nothing — you need bed.
- What's a cicada but a lacebug with nocturnal behavior?
- Don't drink the dish and then gripe about the gnocchi.
- You can't put out a herring and then wonder where all the herringbone came from.
- That's just lorikeeting the fruit and praying for breaking wind.
5. Quality Assessment
Rating Summary
I rated each of the 30 sayings on a 3-tier scale (Good / Okay / Bad):
| Rating | Count | % | Description |
|---|---|---|---|
| Good | 8 | 27% | Sounds natural, humorous, structurally solid |
| Okay | 9 | 30% | Semantically coherent but grammatically rough |
| Bad | 13 | 43% | Broken grammar, nonsensical, or artifact leakage |
Good Examples (natural-sounding, humorous)
- "Nobody's got less salsa than the man who makes the mango."
- "There's a fella who takes the wax and says the sugar's no good."
- "A bamboo don't come without its water, now does it?"
- "Don't take the wheel and then gripe about the hull."
- "Ain't no sleeping at night ever came from nothing — you need bed."
- "My daddy always said, can't have waking up in morning without coffee."
- "What's a cicada but a lacebug with nocturnal behavior?"
- "You can't put out a herring and then wonder where all the herringbone came from."
Common Issues Identified
1. Article / Grammar Errors (frequent)
- "An scarf ain't nothing but..." — should be "A scarf"
- "An silver ain't nothing but..." — should be "Silver"
- "An salt ain't nothing but..." — should be "Salt"
- "A have children don't come without..." — broken slot fill leaking action phrase as noun
2. Multi-Word ConceptNet Phrases Leaking Into Templates (frequent)
- "throw fast ball", "fit bolt", "cool body", "keep as pet", "store blanket"
- "waking up in morning", "sleeping at night", "salty taste"
- "breaking wind", "store blanket", "rough surface"
- These are raw ConceptNet concept IDs that should have been filtered or reformatted
3. Nonsensical Verb Conjugation in Futile Preparation (severe)
- "lorikeeting the fruit" —
lorikeettreated as a verb - "fooding the earthworm" —
foodtreated as a verb - "jeansing the denim" —
jeanstreated as a verb - "safariing the lion" —
safaritreated as a verb - The
_gerund()function applies gerunding to ANY UsedFor target, including nouns
4. LLM Enhancement Artifacts Leaking (moderate)
- "bridge word: plate" appearing in output text
- "bridge 2: food" appearing in output text
- "bridge word: absorption" appearing in output text
- These are raw LLM response fragments that weren't properly cleaned during Phase 2
5. Semantic Mismatches (occasional)
- "A lynx is just a earthworm that's got feline." — wrong category siblings
- "That's like eating the sea and complaining the savanna tastes off." — sea and savanna are not parts of a river
- "A emu is just a ferret that's got walk backwards." — cross-class comparison
Per-Template Quality Assessment
| Template | Typical Quality | Key Issue |
|---|---|---|
| deconstruction | Okay | Multi-word properties leak; article errors with "An" |
| denial_of_consequences | Good | Best template; LLM artifacts occasionally leak through |
| ironic_deficiency | Okay-Bad | Multi-word action phrases used as nouns ("throw fast ball") |
| futile_preparation | Bad | Nouns gerunded as verbs; worst template overall |
| hypocritical_complaint | Okay | Some odd part-of relationships; generally coherent structure |
| tautological_wisdom | Good | Simple structure avoids most issues; multi-word phrases still leak |
| false_equivalence | Good | Benefited most from Phase 3 property enrichment |
6. Errors, Warnings, and Issues
No Errors at Runtime
- Generator runs without crashes on all template types
- All CLI flags work (
--template,--count,--seed,--category,--debug,--json,--entities,--pure-conceptnet,--llm-weight-boost) - JSON output mode produces valid JSONL with complete metadata
- Fictional entity generation works
Issues Found
| Severity | Issue | Impact |
|---|---|---|
| High | LLM Phase 2 artifacts in augmented data ("bridge word:", "bridge 2:") | Raw LLM response fragments leak into generated sayings |
| High | Nouns gerunded as verbs in futile_preparation |
"lorikeeting", "fooding", "jeansing" — template fundamentally broken for non-verb UsedFor targets |
| Medium | Multi-word ConceptNet phrases not filtered | "throw fast ball", "keep as pet" break sentence flow |
| Medium | Article logic doesn't handle "a" vs "an" properly for all cases | "An scarf", "An silver", "An salt" |
| Low | No test suite exists | No automated validation of output quality |
| Low | No virtual environment or requirements.txt | Only stdlib needed currently, but will need deps for corpus generation phase |
| Info | Corpus directory is empty | Expected — corpus generation is the next phase |
7. Readiness Assessment for Corpus Generation
Ready
- Template engine is functional and produces output across all 7 meta-template families
- Augmented graph significantly improves vocabulary coverage (22,316 total edges)
- Vocab expansion added 90 words to cover previously sparse categories
- JSON output mode with full debug metadata is working — ready for bulk generation
- Deduplication logic works (seen_text, seen_slots, seed_usage caps at 30)
- Fictional entity support is implemented and functional
- All corpus pipeline scripts exist (
generate_raw_batch.sh,polish_corpus.py,filter_corpus.py,format_training_pairs.py,compute_corpus_stats.py)
Should Fix Before Corpus Generation
- Clean Phase 2 artifacts from
folksy_relations_augmented.csv— grep for "bridge word" and "bridge 2" in surface_text/end_word fields and remove or repair those edges - Fix
futile_preparationgerunding — the_gerund()function needs a check that the UsedFor target is actually a verb before conjugating it; alternatively, filter UsedFor targets to verb-like words only - Filter multi-word ConceptNet phrases — the
_short_concepts()helper caps at 3 words but many 2-3 word phrases are still awkward as slot fills ("salty taste", "cool body"); consider capping at 2 or adding a verb/noun POS check - Fix article logic — the
_a()function at line 680-684 only checks the first character; "An salt" is wrong because "salt" starts with "s"
Nice to Have
- Add a basic test suite (even just smoke tests that confirm each template generates output)
- Create
requirements.txt(currently stdlib-only, but corpus phase will needrequestsat minimum) - Review the 3,678 candidate OOV words — none exceeded frequency threshold of 3+ for auto-addition, but manual review could find useful additions
Overall Verdict
The template generator works but produces rough output. This is expected and acceptable because the CORPUS_GENERATION_SPEC explicitly accounts for it — the raw output goes through LLM polishing (Phase 2 of corpus generation) where GLM4-32B fixes grammar and discards unsalvageable sayings. The spec estimates a 20-30% discard rate; based on this evaluation, the actual discard rate will likely be 40-50% due to the issues above.
Fixing the four "Should Fix" items before corpus generation would:
- Reduce the discard rate (saving LLM compute time)
- Improve the quality floor of raw output (giving the polish LLM better material to work with)
- Eliminate artifact contamination that could propagate into training data
The generator is functional but not polished — appropriate for its role as a raw material source in a pipeline that includes LLM correction downstream.