Initial 'folksy idiom' generator
This commit is contained in:
commit
8c8a058301
11 changed files with 14485 additions and 0 deletions
44
FOLKSY_GENERATOR_SPEC.md
Normal file
44
FOLKSY_GENERATOR_SPEC.md
Normal file
|
|
@ -0,0 +1,44 @@
|
|||
# Folksy Idiom Generator — Project Spec
|
||||
|
||||
## Overview
|
||||
|
||||
Build a procedural fake-proverb generator that produces sayings which *sound* like real folk wisdom but are semantically hollow. The system uses ConceptNet 5 relationships between concrete nouns to fill typed slots in proverb templates, ensuring every generated saying contains real-world object relationships dressed up in wisdom-sounding syntax.
|
||||
|
||||
The core insight: real folk sayings encode real relationships (corn feeds chickens, ducks live on ponds, pickles are cucumbers plus dill). Fake sayings use the *same* real relationships in proverb-shaped sentence frames. The listener validates the semantic links, parses the structure as wisdom, and then realizes it doesn't actually *teach* anything. That gap is the humor.
|
||||
|
||||
## File Organization
|
||||
|
||||
```
|
||||
folksy-generator/
|
||||
├── FOLKSY_GENERATOR_SPEC.md # This file
|
||||
├── folksy_generator.py # Main CLI tool
|
||||
├── data/
|
||||
│ ├── folksy_vocab.csv # Curated vocabulary
|
||||
│ ├── folksy_relations.csv # Relationship edges
|
||||
│ └── classified_proverbs.csv # Labeled real proverbs
|
||||
├── schemas/
|
||||
│ └── fictional_entities.schema.json
|
||||
├── examples/
|
||||
│ ├── my_world.json # Example fictional entities
|
||||
│ └── sample_output.txt # Example generated sayings
|
||||
└── scripts/
|
||||
└── extract_from_conceptnet.py # One-time extraction script (requires psql access)
|
||||
```
|
||||
|
||||
## Data Sources
|
||||
|
||||
- PostgreSQL database `conceptnet5` with full ConceptNet 5 dataset (37M edges, 33M nodes, 50 relation types)
|
||||
- Curated folksy vocabulary: 500-800 concrete nouns tagged with categories
|
||||
- Relationship edges between vocabulary words with typed relations and confidence weights
|
||||
|
||||
## Meta-Template Families
|
||||
|
||||
1. **deconstruction** — "A without B is just humble D"
|
||||
2. **denial_of_consequences** — "Don't create conditions for B and deny B"
|
||||
3. **ironic_deficiency** — "Producer of X lacks X"
|
||||
4. **futile_preparation** — "Like doing A and hoping for unrelated Y"
|
||||
5. **hypocritical_complaint** — "Consumes X, complains about Y"
|
||||
6. **tautological_wisdom** — "Obviously X leads to Y, stated as wisdom"
|
||||
7. **false_equivalence** — "A is just B with/without property P"
|
||||
|
||||
See full spec in project history for detailed template definitions and example chains.
|
||||
231
data/classified_proverbs.csv
Normal file
231
data/classified_proverbs.csv
Normal file
|
|
@ -0,0 +1,231 @@
|
|||
"proverb","valid","meta_template","notes"
|
||||
"A big wife and a big barn, will never do a man any harm.","FALSE","NONE","Primary nouns are people/roles (wife, man). About social roles and marriage."
|
||||
"A merry companion is music on a journey.","FALSE","NONE","Primary noun is a person/role (companion). About social relationships."
|
||||
"A false friend and a shadow stay only while the sun shines.","FALSE","NONE","Primary noun is a person/role (friend). About human character/loyalty."
|
||||
"All is fair in love and war, but friendship there is truth.","FALSE","NONE","About abstract concepts (love, war, friendship, truth). No concrete swappable nouns."
|
||||
"A clock will run without watching it.","FALSE","NONE","Only one concrete noun (clock). Meaning is about autonomy/trust, not object relationships."
|
||||
"A good neighbor, a found treasure!","FALSE","NONE","Primary noun is a person/role (neighbor). About social relationships."
|
||||
"A friend to everyone is a friend to nobody.","FALSE","NONE","Primary nouns are people/roles (friend). About human social behavior."
|
||||
"A small leak will sink a great ship.","TRUE","causal_chain","Small defect (leak) cascades to big consequence (ship sinking). Nouns swappable: a small crack will collapse a great wall. A=leak, B=ship."
|
||||
"A living dog is better than a dead lion.","FALSE","NONE","Meaning depends on the cultural symbolism of dog (lowly) vs lion (noble). Swapping nouns breaks the metaphor about humility vs pride."
|
||||
"A good wife is the best household furniture.","FALSE","NONE","Primary noun is a person/role (wife). About social roles."
|
||||
"A pebble and a diamond are alike to a blind man.","FALSE","NONE","Primary actor is a person (blind man). About human perception/disability. Nouns aren't freely swappable without the specific blindness context."
|
||||
"An arrogant bug is a cocky roach.","TRUE","false_equivalence","A=arrogant bug, B=cocky roach. Both insects, differ in specificity. A {A} is just a {B} with {P}. Nouns swappable: a fancy hat is a decorated cap."
|
||||
"Better an hour early and stand and wait than a moment behind time.","FALSE","NONE","About abstract concepts (time, punctuality). No concrete swappable nouns."
|
||||
"Better a dollar earned than ten inherited.","FALSE","NONE","About abstract concepts (earning vs inheriting). Only one concrete noun type (dollar). About human work ethic."
|
||||
"Better to ask twice than lose your way once.","FALSE","NONE","About human behavior (asking, navigation). No concrete nouns to swap."
|
||||
"Better bowlegs than no legs at all.","FALSE","NONE","About human body parts. Not about object relationships. Only one noun (legs)."
|
||||
"Better to heaven in rags than to hell in embroidery.","FALSE","NONE","About morality and abstract concepts (heaven, hell). Religious/moral teaching."
|
||||
"Curses, like chickens, come home to roost.","FALSE","NONE","Core meaning is about abstract concept (curses/karma). Chickens are a simile vehicle, not a swappable structural noun."
|
||||
"Don't dare kiss an ugly girl, she'll tell the world about it.","FALSE","NONE","Primary nouns are people (girl). About human social behavior."
|
||||
"Don't taste every man's soup, you'll burn your mouth.","FALSE","NONE","About human behavior (minding others' business). The soup is metaphor for other people's affairs."
|
||||
"Even a fish wouldn't get caught if he kept his mouth shut.","FALSE","NONE","Meaning depends on wordplay (mouth=talking). About human behavior (keeping quiet). Swapping fish breaks the pun."
|
||||
"Every donkey thinks itself worthy of standing with the king's horses.","FALSE","NONE","About human pride/arrogance. King is a social role. The animals represent social hierarchy."
|
||||
"Every path has a puddle.","FALSE","NONE","Only two nouns but meaning is about life's difficulties. Metaphorical, not about physical path-puddle relationship."
|
||||
"Early ripe, early rotten.","TRUE","causal_chain","Quick ripening leads to quick rotting. Swappable: early bloom, early wilt. A=ripeness, B=rottenness. Physical process chain."
|
||||
"Every field looks green from a distance, even a cemetery.","FALSE","NONE","Meaning is about human perception and illusion (grass-is-greener). Cemetery adds dark humor that breaks on swap."
|
||||
"Everybody lays his load on the willing horse.","FALSE","NONE","About human behavior (exploiting willing people). Horse represents a person. Primary subject is people."
|
||||
"Fools use bets for arguments.","FALSE","NONE","Primary noun is people/role (fools). About human behavior."
|
||||
"He who holds the ladder is as bad as the thief.","FALSE","NONE","Primary nouns are people/roles (thief, accomplice). About morality and complicity."
|
||||
"If you come to the end of your rope -- tie a knot in it and hang on.","FALSE","NONE","Rope is metaphor for endurance/patience. About human perseverance, not rope mechanics."
|
||||
"If you are always dwelling in trouble, change your address.","FALSE","NONE","Wordplay on dwelling/address. About human behavior and attitude."
|
||||
"If wishes were horses, then beggars would ride.","FALSE","NONE","Primary nouns include people/roles (beggars). About abstract concept (wishing). Depends on specific wish-horse wordplay."
|
||||
"""IF's"" and ""But's"" butter no bread.","FALSE","NONE","Depends on wordplay (ifs and buts as abstract concepts). Only one concrete noun (bread)."
|
||||
"It takes a good many shovelfuls to bury the truth.","FALSE","NONE","Core meaning is about abstract concept (truth). Shovelfuls are metaphorical."
|
||||
"It is better to have a hen tomorrow than an egg today.","TRUE","proportional_mismatch","Compares small-now vs big-later using concrete nouns. Swappable: better a tree tomorrow than a seed today. A=hen (big), B=egg (small). Time/value mismatch."
|
||||
"Living is like licking honey off a thorn.","FALSE","NONE","About abstract concept (living/life). The honey-thorn image is metaphor for life's bittersweet nature."
|
||||
"Listen at the keyhole and you'll hear news of yourself.","FALSE","NONE","About human behavior (eavesdropping). Primary subject is a person."
|
||||
"Lend your money and lose your friend.","FALSE","NONE","Primary nouns are people/roles (friend) and abstract (money). About social relationships."
|
||||
"Man is the only animal that can be skinned more than once.","FALSE","NONE","Primary noun is a person (man). About human gullibility."
|
||||
"Must is a hard nut to crack.","FALSE","NONE","Depends on wordplay: 'must' (obligation) as a nut. Breaks on noun swap."
|
||||
"No matter how high a bird flies, it has to come down for water.","FALSE","NONE","About human pride/ambition. Bird represents a person. Meaning is about humility."
|
||||
"Nothing dries faster than a tear.","FALSE","NONE","Only one concrete noun (tear). About human emotions and their transience."
|
||||
"Nothing is gained by having one donkey call another ""Long Ears!""","FALSE","NONE","About human hypocrisy (pot calling kettle). Animals represent people."
|
||||
"Never stop the plough to catch a mouse.","TRUE","proportional_mismatch","Big important task (ploughing) vs tiny distraction (mouse). Swappable: never halt the ship to chase a gull. A=plough (big task), B=mouse (small distraction)."
|
||||
"No piper ever suited all ears.","FALSE","NONE","Primary noun is a person/role (piper). About inability to please everyone."
|
||||
"One who thinks he can live without others is mistaken but he who thinks others cannot live without him are more mistaken.","FALSE","NONE","Entirely about people and social relationships. No concrete nouns."
|
||||
"One eyewitness is better than ten hearsays.","FALSE","NONE","About abstract concepts (testimony, evidence). Primary noun is a person/role (eyewitness)."
|
||||
"One does not put beauty in a kettle.","FALSE","NONE","About abstract concept (beauty). Only one concrete noun (kettle)."
|
||||
"Promises won't butter any bread.","FALSE","NONE","About abstract concept (promises). Only one concrete noun (bread). About human unreliability."
|
||||
"Pleasant hours fly fast.","FALSE","NONE","About abstract concepts (time, pleasure). No concrete nouns."
|
||||
"Sickness comes in haste and goes at leisure.","FALSE","NONE","About abstract concept (sickness). No concrete swappable nouns."
|
||||
"Swallows and sparrows cannot understand the ambitions of swans.","FALSE","NONE","Animals represent social classes of people. About human ambition and hierarchy."
|
||||
"The sun doesn't shine on the same dog's back every day.","FALSE","NONE","About fortune/luck changing. Dog represents a person. Metaphorical."
|
||||
"The best patch is of the same cloth.","TRUE","material_transformation","Repair material should match original material. Swappable: the best weld is of the same metal. A=patch, B=cloth. About matching materials."
|
||||
"The stable wears out a horse more than a road.","FALSE","NONE","About inactivity being worse than activity. Horse represents a person. Metaphorical lesson about human laziness."
|
||||
"When the well is dry, you know the worth of water.","TRUE","tautological_wisdom","Absence reveals value. Swappable: when the pantry is bare, you know the worth of bread. A=well (container), B=water (resource)."
|
||||
"When one has seen the bear in the woods, he hears his growl in every bush.","FALSE","NONE","About human psychology (fear, paranoia). The person is the primary subject."
|
||||
"Weeds need no sowing.","TRUE","tautological_wisdom","Unwanted things propagate without effort. Swappable: rust needs no invitation. States an obvious property of weeds as wisdom. A=weeds, B=sowing."
|
||||
"You can't anymore give away something you ain't got than you can come back from someplace you haven't been.","FALSE","NONE","About abstract concept (possession, presence). No concrete swappable nouns. Philosophical truism."
|
||||
"You never know the length of a snake until it is dead.","FALSE","NONE","About judging things (people) only in hindsight. Snake metaphorically represents a threat/person. Meaning is about human judgment."
|
||||
"You can't tell the depth of the well by the length of the handle on the pump.","TRUE","uncategorized","Surface measurement doesn't reveal hidden depth. Swappable: can't tell the weight of a chest by the size of its lock. Concrete nouns: well, handle, pump."
|
||||
"You can't put out old heads on young shoulders.","FALSE","NONE","Primary nouns are people/body parts (heads, shoulders). About human wisdom and youth."
|
||||
"Do a little well and you do much.","FALSE","NONE","About abstract concept (quality vs quantity of effort). No concrete nouns."
|
||||
"A bad broom leaves a dirty room.","TRUE","causal_chain","Defective tool leads to poor result. Swappable: a dull axe leaves a ragged log. A=bad broom, B=dirty room. Concrete cause-effect."
|
||||
"They must hunger in frost who will not work in heat.","FALSE","NONE","About human behavior (work ethic). Primary subject is people. About consequences of laziness."
|
||||
"A cracked plate will last as long as a sound one.","TRUE","uncategorized","Imperfect object still functions. Swappable: a dented bucket holds as much as a new one. Concrete nouns: plate. About physical durability."
|
||||
"Water run by will does not turn a mill.","TRUE","tautological_wisdom","Resource past the mechanism can't power it. Swappable: steam vented out does not drive a piston. A=water, B=mill. Physical cause-effect."
|
||||
"Every pea helps to fill the pod.","TRUE","tautological_wisdom","Small contributions add up to fill the container. Swappable: every brick helps to build the wall. A=pea, B=pod."
|
||||
"One watch set right will do to set many by.","TRUE","uncategorized","One correct reference calibrates others. Swappable: one true plumb line will straighten many walls. Concrete nouns: watch."
|
||||
"Children and fools tell the truth.","FALSE","NONE","Primary nouns are people/roles (children, fools). About human behavior."
|
||||
"Little children step on one's lap; tall ones tread on one's heart.","FALSE","NONE","Primary nouns are people (children). About parenting and emotional pain."
|
||||
"He who rides slowly gets just as far, only it akes a little longer.","FALSE","NONE","About human behavior (patience). Primary subject is a person."
|
||||
"Bad breath is better than no breath at all.","FALSE","NONE","Wordplay on breath (breathing vs bad breath). About being alive. Depends on pun."
|
||||
"Too many square meals make too many round people.","FALSE","NONE","Wordplay on square (meals) and round (people/overweight). Depends on pun. About human behavior."
|
||||
"When you feel all steamed up, remember the tea kettle -- it is always up to its neck in hot water and it still sings.","FALSE","NONE","Wordplay on steamed up and hot water. About human emotions (anger). Metaphorical."
|
||||
"You can't make cookies when you haven't got the dough.","FALSE","NONE","Wordplay on dough (money vs baking ingredient). Meaning depends on pun."
|
||||
"A bad penny always turns up","FALSE","NONE","Penny represents an unwanted person. About human social dynamics. Metaphorical."
|
||||
"A bird in the hand is worth two in the bush","TRUE","proportional_mismatch","Certain small possession vs uncertain larger quantity. Swappable: a fish on the hook is worth five in the stream. A=bird in hand, B=two in bush."
|
||||
"A dog is a man's best friend","FALSE","NONE","Primary noun is a person (man). About human-animal social bond. Not swappable structurally."
|
||||
"A mill cannot grind with the water that is past","TRUE","tautological_wisdom","Missed resource can't power the mechanism. Swappable: an oven can't bake with the heat already gone. A=mill, B=water. Same as #66 variant."
|
||||
"A miss is as good as a mile","FALSE","NONE","Wordplay on miss/mile. About abstract concept (near-misses). Depends on pun."
|
||||
"A rolling stone gathers no moss","TRUE","uncategorized","Moving object doesn't accumulate surface growth. Swappable: a spinning wheel collects no dust. A=stone, B=moss."
|
||||
"A watched man never plays","FALSE","NONE","Primary noun is a person (man). About human behavior under surveillance."
|
||||
"A watched pot/kettle never boils","TRUE","tautological_wisdom","Observation seems to delay the process. Swappable: a watched oven never heats. A=pot/kettle, B=boiling. Concrete object-process relationship."
|
||||
"All hands on deck/to the pump","FALSE","NONE","Primary nouns are people (hands=crew). About human collective effort."
|
||||
"All is grist that comes to the mill","TRUE","tautological_wisdom","Everything entering the mechanism gets processed. Swappable: all is fuel that enters the furnace. A=grist, B=mill."
|
||||
"An apple a day keeps the doctor away","FALSE","NONE","Primary noun includes a person/role (doctor). About health advice. Culturally specific to apple."
|
||||
"An army marches on its stomach","FALSE","NONE","Primary noun is people/collective (army). About human logistics and morale."
|
||||
"Any port in a storm","FALSE","NONE","Port/storm are metaphors for refuge/crisis. About human desperation and lowered standards."
|
||||
"As you sow so shall you reap","FALSE","NONE","About human behavior (actions and consequences). The sowing/reaping is metaphorical for moral behavior."
|
||||
"Barking dogs seldom bite","FALSE","NONE","Dogs represent threatening people. About human behavior (bluster vs action)."
|
||||
"Before setting out on a mission of vengeance, dig two graves","FALSE","NONE","About human behavior (revenge). Primary subject is a person. Abstract moral teaching."
|
||||
"Beggars cannot be choosers","FALSE","NONE","Primary noun is a person/role (beggars). About human social position."
|
||||
"Big fish eat little fish","FALSE","NONE","Fish represent people in power dynamics. About human hierarchy and exploitation."
|
||||
"Birds of a feather (flock together)","FALSE","NONE","Birds represent people. About human social grouping by similarity."
|
||||
"Buy cheap, buy twice","FALSE","NONE","No concrete swappable nouns. About abstract purchasing behavior."
|
||||
"Calm seas never made a good sailor","FALSE","NONE","Primary noun is a person/role (sailor). About human character development through adversity."
|
||||
"Coffee and love taste best when hot (Ethiopian proverb)","FALSE","NONE","Mixes concrete (coffee) with abstract (love). About human emotion. Not freely swappable."
|
||||
"Cold hands, warm heart","FALSE","NONE","About human body/character. Primary nouns are body parts. About personality."
|
||||
"Criss-cross, applesauce","FALSE","NONE","Children's rhyme, not a wisdom proverb. No meaningful structure to classify."
|
||||
"Cross the stream where it is shallowest","TRUE","uncategorized","Take the easiest path through an obstacle. Swappable: climb the wall where it is lowest. A=stream, property=shallowest. Concrete spatial relationship."
|
||||
"Cut your coat according to your cloth","TRUE","material_transformation","Output must match available input material. Swappable: shape your pot according to your clay. A=coat, B=cloth. Material constraint."
|
||||
"Do not carry coals to Newcastle","FALSE","NONE","Meaning depends on specific cultural knowledge (Newcastle = coal town). Breaks on noun swap without the cultural reference."
|
||||
"Do not keep a dog and bark yourself","FALSE","NONE","About human behavior (delegation). Dog represents a subordinate/servant. About social roles."
|
||||
"Do not make a mountain out of a mole hill","TRUE","proportional_mismatch","Inflating small thing (molehill) to large thing (mountain). Swappable: don't make an ocean out of a puddle. A=mountain, B=molehill."
|
||||
"Do not put the cart before the horse","TRUE","uncategorized","Sequence/order matters for physical function. Swappable: don't put the roof before the walls. A=cart, B=horse. Concrete ordering relationship."
|
||||
"Do not put too many irons in the fire","TRUE","proportional_mismatch","Too many items overwhelm the resource. Swappable: don't put too many pots on the stove. A=irons, B=fire. Concrete capacity limit."
|
||||
"Do not put new wine into old bottles","TRUE","material_transformation","New content incompatible with old container. Swappable: don't pour hot soup into a cold jar. A=new wine, B=old bottles. Material mismatch."
|
||||
"Do not teach your Grandmother to suck eggs","FALSE","NONE","Primary noun is a person/role (Grandmother). About human presumption and social roles."
|
||||
"Do not throw the baby out with the bathwater","FALSE","NONE","Primary noun is a person (baby). About discarding the valuable with the worthless. Baby is not a swappable object."
|
||||
"Don't take any wooden nickels","FALSE","NONE","About human gullibility/deception. The wooden nickels are metaphor for scams."
|
||||
"East is east, and west is west (and never the twain shall meet)","FALSE","NONE","About abstract concepts (cultural differences). Directions are not concrete swappable nouns in meaningful way."
|
||||
"Even a worm will turn","FALSE","NONE","Worm represents a downtrodden person. About human behavior (fighting back when pushed too far)."
|
||||
"Every dog has his day","FALSE","NONE","Dog represents a person. About human fortune and fairness. Abstract."
|
||||
"Every stick has two ends","TRUE","tautological_wisdom","Every tool/situation has two aspects. Swappable: every blade has two edges. A=stick, B=two ends. States obvious physical property as wisdom."
|
||||
"Feed a cold, starve a fever","FALSE","NONE","About abstract concepts (illness). Medical folk advice. Cold and fever aren't concrete swappable nouns."
|
||||
"Fight fire with fire","FALSE","NONE","Only one concrete noun repeated (fire). About strategy/approach. Metaphorical."
|
||||
"Fine words butter no parsnips","FALSE","NONE","About abstract concept (words/promises). Similar to #48. Only one concrete noun (parsnips)."
|
||||
"First come, first served","FALSE","NONE","About human behavior (queuing/priority). No concrete nouns."
|
||||
"Fish and guests smell after three days","FALSE","NONE","Primary noun includes people (guests). About human social behavior (overstaying welcome)."
|
||||
"For want of a nail the shoe was lost; for want of a shoe the horse was lost; and for want of a horse the man was lost","TRUE","causal_chain","Classic cascading loss. Small item (nail) -> shoe -> horse -> man. Swappable: for want of a rivet the plate was lost. A=nail, B=shoe, C=horse, D=man."
|
||||
"Forewarned is forearmed","FALSE","NONE","About abstract concepts (knowledge, preparation). No concrete nouns. Wordplay on fore-."
|
||||
"Give a dog a bad name and hang him","FALSE","NONE","Dog represents a person. About human reputation and social judgment."
|
||||
"Give a man rope enough and he will hang himself","FALSE","NONE","Primary noun is a person (man). About human self-destruction."
|
||||
"Good fences make good neighbours","FALSE","NONE","Primary noun includes people (neighbours). About human social boundaries."
|
||||
"Half a loaf is better than no bread","TRUE","proportional_mismatch","Partial resource better than none. Swappable: half a bucket is better than no water. A=half loaf, B=no bread."
|
||||
"He that goes a-borrowing, goes a-sorrowing","FALSE","NONE","Primary noun is a person (he). About human behavior (debt)."
|
||||
"Horses for courses","FALSE","NONE","About matching suitability. Depends on wordplay (horses/courses rhyme). Breaks on swap."
|
||||
"If ifs and ands were pots and pans, there would be no work for tinkers","FALSE","NONE","Depends on wordplay (ifs and ands/pots and pans). Primary noun includes person/role (tinkers)."
|
||||
"If ifs and buts were candies and nuts, we'd all have a merry Christmas[13]","FALSE","NONE","Depends on wordplay. About abstract concepts (excuses). Cultural reference (Christmas)."
|
||||
"If it ain't broke, don't fix it","FALSE","NONE","No specific concrete nouns (uses generic 'it'). About abstract concept of leaving well enough alone."
|
||||
"If it were a snake, it would have bit you","FALSE","NONE","About human inattention. Snake is metaphor for something obvious. About behavior."
|
||||
"If the shoe fits, wear it","FALSE","NONE","Shoe is metaphor for criticism/description. About accepting truth about oneself. Meaning breaks on noun swap."
|
||||
"If wishes were horses, beggars would ride","FALSE","NONE","Primary noun includes people/roles (beggars). About abstract concept (wishing). Duplicate of #31."
|
||||
"If you can't run with the horses, get off the track","FALSE","NONE","Horses represent competitors/people. About human ability and knowing your limits."
|
||||
"If you can't stand the heat, get out of the kitchen","FALSE","NONE","Kitchen/heat are metaphors for workplace pressure. About human resilience. Meaning is social, not physical."
|
||||
"If you give a mouse a cookie, he'll always ask for a glass of milk","FALSE","NONE","Mouse represents a person (from children's book). About human behavior (escalating demands). Anthropomorphized animal."
|
||||
"If you lie down with dogs, you will get up with fleas[a] (James Sandford, The Garden of Pleasure)","TRUE","conditional_animal_wisdom","Association with X gives you X's problems. Swappable: if you sleep in the barn, you'll wake with hay. A=dogs, B=fleas. Animal behavior encodes consequence."
|
||||
"If you pay peanuts, you get monkeys","FALSE","NONE","Peanuts/monkeys represent wages/workers. About human employment and compensation. Metaphorical."
|
||||
"If you play with fire, you will get burned","FALSE","NONE","Only one concrete noun (fire). About human recklessness. Too metaphorical to swap meaningfully."
|
||||
"In for a penny, in for a pound","FALSE","NONE","About human behavior (commitment escalation). Penny/pound represent stakes. Metaphorical."
|
||||
"(March comes) in like a lion, (and goes) out like a lamb","FALSE","NONE","About weather/seasons. Animals are simile vehicles, not structural nouns. Culturally specific to March."
|
||||
"It ain't over till the fat lady sings","FALSE","NONE","Primary noun is a person (fat lady). Cultural reference (opera). About human patience."
|
||||
"It is no use crying over spilt milk","TRUE","tautological_wisdom","Can't undo a spillage. Swappable: no use sweeping up shattered glass. A=spilt milk. Irreversible physical event."
|
||||
"It is the early bird that gets the worm","TRUE","conditional_animal_wisdom","First arrival gets the resource. Swappable: the early cat catches the mouse. A=early bird, B=worm. Animal behavior encodes timing lesson."
|
||||
"It is the squeaky wheel that gets the grease","TRUE","uncategorized","Malfunctioning part draws maintenance attention. Swappable: the leaky pipe gets the patch. A=squeaky wheel, B=grease."
|
||||
"It never rains but it pours","FALSE","NONE","About abstract concept (misfortune clustering). Rain is metaphorical. Only one type of noun."
|
||||
"It takes a thief to catch a thief","FALSE","NONE","Primary noun is a person/role (thief). About human knowledge/expertise."
|
||||
"Islands depend on reeds, just as reeds depend on islands (Myanmar proverbs)[citation needed]","TRUE","uncategorized","Mutual dependency between two physical things. Swappable: walls depend on mortar, just as mortar depends on walls. A=islands, B=reeds."
|
||||
"Keep your powder dry (Valentine Blacker, 1834 from Oliver's Advice)[16]","FALSE","NONE","Only one concrete noun (powder). About human preparedness. Military metaphor."
|
||||
"Kill two birds with one stone.","TRUE","proportional_mismatch","One resource accomplishes multiple tasks. Swappable: catch two fish with one net. A=two birds, B=one stone. Efficiency through concrete objects."
|
||||
"The last drop makes the cup run over","TRUE","causal_chain","Final small addition causes overflow. Swappable: the last straw breaks the beam. A=last drop, B=cup overflow."
|
||||
"Laugh before breakfast, cry before supper","FALSE","NONE","About human emotion and superstition. No concrete swappable nouns (laugh/cry are actions about people)."
|
||||
"Learn a language, and you will avoid a war (Arab proverb)","FALSE","NONE","About abstract concepts (language, war, communication). About human behavior."
|
||||
"Let sleeping dogs lie","FALSE","NONE","Dogs represent dormant problems/people. About human caution and social dynamics."
|
||||
"Let well alone","FALSE","NONE","About abstract concept. No concrete nouns at all."
|
||||
"Little pitchers have big ears","FALSE","NONE","Wordplay: pitchers have 'ears' (handles) = children eavesdrop. Depends on pun."
|
||||
"Little strokes fell great oaks","TRUE","causal_chain","Many small actions achieve a big result. Swappable: small drops fill great barrels. A=little strokes, B=great oaks."
|
||||
"Make hay while the sun shines","TRUE","uncategorized","Process resource while conditions allow. Swappable: dry the salt while the wind blows. A=hay, B=sun. Concrete time-dependent process."
|
||||
"Many a mickle makes a muckle","FALSE","NONE","Depends on wordplay (mickle/muckle are archaic words). About abstract accumulation."
|
||||
"March comes in like a lion and goes out like a lamb","FALSE","NONE","Duplicate of #141. About weather/seasons. Animals are simile vehicles."
|
||||
"Milking the bull","TRUE","futile_preparation","Attempting to extract product from wrong source. Swappable: shearing the pig. A=milking, B=bull. Wrong-source futility with concrete nouns."
|
||||
"Money demands care, you abuse it and it disappears – Rashida Costa","FALSE","NONE","About abstract concept (money management). About human behavior."
|
||||
"Never cast a clout until May be out","FALSE","NONE","Culturally specific (May=month or hawthorn). About human clothing choices. Breaks on swap."
|
||||
"Never look a gift horse in the mouth","FALSE","NONE","Meaning depends on specific horse-inspection practice (teeth = age). About human ingratitude. Culturally specific."
|
||||
"No friends but the mountains[24]","FALSE","NONE","About human isolation and social relationships (Kurdish proverb). Friends are people."
|
||||
"Oil and water do not mix","TRUE","tautological_wisdom","Two incompatible substances won't combine. Swappable: sand and mercury do not blend. A=oil, B=water. Physical property stated as wisdom."
|
||||
"One hand washes the other","FALSE","NONE","Hands represent people. About human reciprocity and cooperation."
|
||||
"One kind word can warm three winter months","FALSE","NONE","About abstract concept (kindness). Primary subject is human speech/emotion."
|
||||
"One man's meat is another man's poison","FALSE","NONE","Primary nouns include people (man). About human subjective preference."
|
||||
"One might as well be hanged for a sheep as a lamb","FALSE","NONE","About human punishment/justice. Person is implied primary subject. Sheep/lamb represent the crime's magnitude."
|
||||
"One swallow does not make a summer","TRUE","tautological_wisdom","Single instance doesn't establish a pattern. Swappable: one raindrop does not make a flood. A=swallow, B=summer. Concrete sign vs full phenomenon."
|
||||
"One year's seeding makes seven years weeding","TRUE","causal_chain","Small initial neglect creates prolonged cleanup. Swappable: one season's rust makes seven seasons' sanding. A=seeding, B=weeding. Concrete agricultural cause-effect."
|
||||
"Out of the frying pan and into the fire","TRUE","uncategorized","Escape from one danger into a worse one. Swappable: out of the puddle and into the lake. A=frying pan, B=fire. Concrete escalation of same type."
|
||||
"Penny wise and pound foolish","TRUE","proportional_mismatch","Careful with small amount, wasteful with large. Swappable: thimble-wise and barrel-foolish. A=penny (small), B=pound (large)."
|
||||
"Penny, Penny. Makes many.","FALSE","NONE","About abstract concept (accumulation of money). Only one concrete noun. Too vague."
|
||||
"Putting the cart before the horse","TRUE","uncategorized","Sequence/order matters for physical function. Swappable: putting the roof before the walls. A=cart, B=horse. Duplicate of #105."
|
||||
"Red sky at night shepherd's delight; red sky in the morning, shepherd's warning","FALSE","NONE","Primary noun includes person/role (shepherd). Weather prediction based on culturally specific observation. Breaks on swap."
|
||||
"Risk it for a biscuit.[27]","FALSE","NONE","Depends on wordplay/rhyme (risk/biscuit). About human behavior (risk-taking). Only one concrete noun."
|
||||
"See a pin and pick it up, all the day you will have good luck; See a pin and let it lay, bad luck you will have all day","FALSE","NONE","About superstition. Only one concrete noun (pin). About human behavior and luck."
|
||||
"Set a thief to catch a thief","FALSE","NONE","Primary noun is person/role (thief). Duplicate concept of #147. About human expertise."
|
||||
"Softly, softly, catchee monkey","FALSE","NONE","About human behavior (patience, strategy). Monkey represents a goal/person. Colonial-era idiom."
|
||||
"Speak as you find","FALSE","NONE","About human behavior (honesty). No concrete nouns."
|
||||
"Speak of the devil and he shall/is sure/will appear","FALSE","NONE","About abstract/supernatural concept. About human social coincidence."
|
||||
"Strike while the iron is hot","TRUE","uncategorized","Act on material while conditions allow. Swappable: pour while the wax is soft. A=iron, condition=hot. Concrete time-dependent action. Similar to #158."
|
||||
"(A) swarm in May is worth a load of hay; a swarm in June is worth a silver spoon; but a swarm in July is not worth a fly","FALSE","NONE","Depends on specific month-value associations and rhyme scheme. Culturally specific beekeeping knowledge. Breaks on swap."
|
||||
"Take care of the pennies, and the pounds will take care of themselves","TRUE","causal_chain","Attention to small units accumulates to large result. Swappable: mind the drops and the barrels fill themselves. A=pennies, B=pounds."
|
||||
"The apple does not fall/never falls far from the tree","FALSE","NONE","About human heredity/behavior (children resemble parents). Apple/tree represent parent-child. About people."
|
||||
"The best-laid schemes of mice and men often go awry","FALSE","NONE","Primary noun includes people (men). About human planning and failure. Literary reference (Burns)."
|
||||
"The bread never falls but on its buttered side","TRUE","tautological_wisdom","Things go wrong in the worst way. Swappable: the toast always lands jam-side down. A=bread, B=buttered side. Physical (Murphy's law with concrete nouns)."
|
||||
"The cobbler always wears the worst shoes","TRUE","ironic_deficiency","Producer of X lacks X. Swappable: the baker's family eats stale bread. A=cobbler, X=shoes. Classic ironic deficiency."
|
||||
"The early bird catches the worm","TRUE","conditional_animal_wisdom","First arrival gets the resource. Swappable: the early cat catches the mouse. A=early bird, B=worm. Duplicate of #144."
|
||||
"It is the last straw that breaks the camel's back","TRUE","causal_chain","Final small addition causes collapse. Swappable: the last grain tips the scale. A=last straw, B=camel's back."
|
||||
"The left hand doesn't know what the right hand is doing","FALSE","NONE","Hands represent people/departments. About human organizational dysfunction."
|
||||
"The Moon is made of green cheese","FALSE","NONE","Deliberate absurdity/irony. About human gullibility. Not a wisdom structure."
|
||||
"The more the merrier","FALSE","NONE","About abstract concept. No concrete nouns. About human social preference."
|
||||
"The nail that sticks out gets hammered down","FALSE","NONE","Nail represents a nonconformist person. About human social conformity pressure (Japanese cultural proverb)."
|
||||
"The old wooden spoon beats me down","FALSE","NONE","Primary subject is a person (me). About human experience/punishment."
|
||||
"The shoemaker's son always goes barefoot","TRUE","ironic_deficiency","Producer of X's family lacks X. Swappable: the brewer's household drinks water. A=shoemaker, X=shoes. Classic ironic deficiency."
|
||||
"The squeaky wheel gets the grease","TRUE","uncategorized","Malfunctioning part gets maintenance attention. Swappable: the leaky pipe gets the solder. A=squeaky wheel, B=grease. Duplicate of #145."
|
||||
"The squeaky wheel gets the oil","TRUE","uncategorized","Malfunctioning part gets maintenance. Variant of #145/#199. Swappable: the rusty hinge gets the lubricant. A=squeaky wheel, B=oil."
|
||||
"The streets are paved with gold","FALSE","NONE","About abstract concept (opportunity, wealth). Metaphorical. Not about physical streets."
|
||||
"The stupid monkey knows not to eat the banana skin","FALSE","NONE","About human behavior (common sense). Monkey represents a person. 'Even a fool knows X.'"
|
||||
"There are more ways of killing a cat than choking it with cream","FALSE","NONE","About human problem-solving (multiple approaches). Cat represents the problem. Violent metaphor."
|
||||
"There are always more fish in the sea","FALSE","NONE","Fish represent people (romantic partners). About human relationships and moving on."
|
||||
"There is many a good tune played on an old fiddle","FALSE","NONE","Fiddle represents an older person. About human capability despite age. Metaphorical."
|
||||
"Too many cooks spoil the broth","FALSE","NONE","Primary nouns include people/roles (cooks). About human coordination failure and too many people involved."
|
||||
"Too much rain makes a flood","TRUE","tautological_wisdom","Excess of input creates overflow. Swappable: too much heat makes a fire. A=rain, B=flood. Physical cause-effect."
|
||||
"Two birds with one stone","TRUE","proportional_mismatch","One resource accomplishes multiple goals. Shortened duplicate of #150. Swappable: two fish with one net. A=two birds, B=one stone."
|
||||
"Use it or lose it","FALSE","NONE","About abstract concept. No concrete nouns. About human behavior."
|
||||
"United we bargain; divided we beg","FALSE","NONE","About human collective behavior (labor/politics). No concrete nouns."
|
||||
"Walk softly but carry a big stick (26th U.S. President Theodore Roosevelt, 1900 in letter relating an old African proverb)[36]","FALSE","NONE","About human behavior (diplomacy, power). Stick is metaphor for military/political power."
|
||||
"Walnuts and pears you plant for your heirs","FALSE","NONE","Primary noun includes people (heirs). About human generational planning. Depends on specific slow-growing trees."
|
||||
"What is sauce for the goose is sauce for the gander","FALSE","NONE","Goose/gander represent people (often gendered). About human fairness and equal treatment."
|
||||
"What the eye does not see (the heart does not grieve over)","FALSE","NONE","About human psychology. Eye/heart are body parts representing perception/emotion."
|
||||
"When it rains it pours","FALSE","NONE","Duplicate of #146. About abstract concept (misfortune clustering)."
|
||||
"When the cat is away, the mice will play","FALSE","NONE","Cat/mice represent authority figure and subordinates. About human behavior when unsupervised."
|
||||
"Who will bell the cat?","FALSE","NONE","About human courage and collective action. Cat represents danger. People are the implied subject."
|
||||
"Why keep a dog and bark yourself?","FALSE","NONE","Duplicate concept of #103. About human delegation. Dog represents subordinate."
|
||||
"You can lead a horse to water, but you cannot make it drink","FALSE","NONE","Horse represents a person. About human stubbornness and free will."
|
||||
"You cannot burn a candle at both ends.","FALSE","NONE","Candle represents human energy/time. About overwork and exhaustion. Metaphorical."
|
||||
"You cannot make a silk purse from a sow's ear","TRUE","material_transformation","Can't make quality output from poor input. Swappable: can't make a gold ring from a lead slug. A=silk purse, B=sow's ear."
|
||||
"You cannot make bricks without straw","TRUE","material_transformation","Can't produce output without essential input material. Swappable: can't bake bread without flour. A=bricks, B=straw."
|
||||
"You cannot run with the hare and hunt with the hounds","FALSE","NONE","Hare/hounds represent opposing human factions. About human loyalty and picking sides."
|
||||
"(You cannot) teach an old dog new tricks","FALSE","NONE","Dog represents a person. About human adaptability and age. Metaphorical."
|
||||
"You cannot unscramble eggs","TRUE","tautological_wisdom","Irreversible transformation can't be undone. Swappable: you cannot unbake bread. A=eggs, B=scrambling. States irreversibility as wisdom."
|
||||
"You catch more flies with honey than with vinegar","FALSE","NONE","Flies represent people. Honey/vinegar represent kind vs harsh approaches. About human persuasion."
|
||||
"You pay your dime and you takes your chances","FALSE","NONE","About human behavior (gambling, risk). Only one concrete noun (dime). About accepting consequences."
|
||||
"You scratch my back and I will scratch yours","FALSE","NONE","About human reciprocity. Primary subjects are people. Back-scratching is metaphor for favors."
|
||||
"You've got to separate the wheat from the chaff","TRUE","material_transformation","Must sort valuable from worthless material. Swappable: sift the gold from the sand. A=wheat, B=chaff."
|
||||
"You've made your bed and you must lie in/on it","FALSE","NONE","About human behavior (accepting consequences). Bed-making is metaphor for choices. About personal responsibility."
|
||||
|
11097
data/folksy_relations.csv
Normal file
11097
data/folksy_relations.csv
Normal file
File diff suppressed because it is too large
Load diff
535
data/folksy_vocab.csv
Normal file
535
data/folksy_vocab.csv
Normal file
|
|
@ -0,0 +1,535 @@
|
|||
word,categories,tangibility_score,conceptnet_edge_count,frequency_rank
|
||||
water,beverage,0.89,2393,0
|
||||
fish,animal,0.57,1967,0
|
||||
iron,"metal,mineral",0.93,1523,0
|
||||
salt,"mineral,spice",0.95,1500,0
|
||||
tree,"plant,wood",0.97,1463,0
|
||||
computer,tool,0.95,1136,0
|
||||
wood,material,0.91,1096,0
|
||||
rock,stone,0.86,860,0
|
||||
ship,vehicle,1.0,847,0
|
||||
copper,metal,0.75,824,0
|
||||
insect,animal,1.0,767,0
|
||||
car,"tool,vehicle",0.91,761,0
|
||||
paper,material,0.93,725,0
|
||||
cat,animal,0.89,715,0
|
||||
church,building,0.98,713,0
|
||||
school,building,0.97,684,0
|
||||
aluminum,"material,metal",0.4,656,0
|
||||
magnesium,metal,0.5,653,0
|
||||
bed,furniture,0.99,625,0
|
||||
organ,instrument,0.92,622,0
|
||||
boat,vehicle,0.99,612,0
|
||||
box,container,0.99,590,0
|
||||
wine,"beverage,food",0.85,589,0
|
||||
glass,"container,mineral,vehicle",0.76,579,0
|
||||
silver,metal,0.84,560,0
|
||||
heart,container,0.93,559,0
|
||||
scale,instrument,0.98,550,0
|
||||
office,building,1.0,532,0
|
||||
egg,food,0.88,531,0
|
||||
root,plant,1.0,527,0
|
||||
plane,vehicle,1.0,513,0
|
||||
grass,plant,0.94,504,0
|
||||
milk,beverage,0.85,504,0
|
||||
gold,metal,0.76,487,0
|
||||
potassium,mineral,1.0,444,0
|
||||
fly,insect,0.67,428,0
|
||||
weapon,tool,0.98,428,0
|
||||
shell,container,1.0,424,0
|
||||
cheese,food,0.95,411,0
|
||||
ice,material,0.74,402,0
|
||||
vegetable,food,0.0,402,0
|
||||
level,tool,0.93,397,0
|
||||
plastic,material,0.87,394,0
|
||||
gun,weapon,0.91,390,0
|
||||
beer,beverage,0.84,383,0
|
||||
knife,tool,0.97,379,0
|
||||
cabinet,furniture,1.0,377,0
|
||||
leather,"fabric,material",0.87,375,0
|
||||
desk,furniture,1.0,371,0
|
||||
stem,plant,1.0,366,0
|
||||
dress,clothing,0.62,364,0
|
||||
rope,material,1.0,356,0
|
||||
rail,bird,1.0,352,0
|
||||
cotton,"crop,fabric,plant",0.98,349,0
|
||||
tin,metal,0.97,347,0
|
||||
chicken,food,0.89,339,0
|
||||
worm,animal,1.0,329,0
|
||||
wool,"fabric,material",0.94,318,0
|
||||
potato,"food,vegetable",1.0,317,0
|
||||
coal,rock,1.0,315,0
|
||||
bat,animal,0.96,314,0
|
||||
steel,metal,0.88,309,0
|
||||
corn,"food,vegetable",0.88,293,0
|
||||
library,building,0.99,285,0
|
||||
coffee,beverage,0.83,284,0
|
||||
titanium,metal,0.0,281,0
|
||||
blade,weapon,1.0,280,0
|
||||
clay,material,0.75,273,0
|
||||
pot,tool,0.98,271,0
|
||||
prison,building,0.98,271,0
|
||||
mine,weapon,1.0,263,0
|
||||
crab,animal,1.0,260,0
|
||||
jar,container,1.0,260,0
|
||||
theater,building,1.0,260,0
|
||||
rice,grain,0.76,259,0
|
||||
bottle,container,0.96,258,0
|
||||
mercury,metal,0.56,258,0
|
||||
bark,plant,0.4,257,0
|
||||
nickel,metal,0.67,252,0
|
||||
drawer,container,1.0,251,0
|
||||
stable,building,0.88,242,0
|
||||
bomb,weapon,0.98,234,0
|
||||
sponge,animal,0.67,233,0
|
||||
garden,landscape,0.97,232,0
|
||||
punch,tool,0.0,231,0
|
||||
nut,seed,1.0,230,0
|
||||
bee,insect,0.99,228,0
|
||||
drum,instrument,1.0,227,0
|
||||
well,structure,1.0,227,0
|
||||
rose,plant,0.67,224,0
|
||||
beef,food,0.92,223,0
|
||||
wax,material,0.85,222,0
|
||||
fox,animal,1.0,218,0
|
||||
turkey,food,1.0,211,0
|
||||
wheat,grain,0.97,210,0
|
||||
butter,food,0.8,207,0
|
||||
truck,vehicle,0.93,207,0
|
||||
coach,vehicle,0.8,206,0
|
||||
skate,fish,0.01,206,0
|
||||
ant,insect,1.0,202,0
|
||||
phone,tool,0.95,202,0
|
||||
soup,food,1.0,202,0
|
||||
ridge,landscape,0.0,202,0
|
||||
orange,"food,fruit",0.54,201,0
|
||||
butterfly,insect,0.8,200,0
|
||||
nail,tool,1.0,199,0
|
||||
nest,shelter,1.0,196,0
|
||||
rabbit,animal,0.67,193,0
|
||||
tea,beverage,1.0,193,0
|
||||
airplane,vehicle,0.98,191,0
|
||||
apartment,building,0.98,189,0
|
||||
fence,structure,0.23,189,0
|
||||
triangle,"instrument,tool",1.0,188,0
|
||||
diamond,"material,stone",0.76,187,0
|
||||
hollow,landscape,0.14,187,0
|
||||
candle,tool,1.0,183,0
|
||||
guitar,instrument,0.93,182,0
|
||||
saw,tool,1.0,181,0
|
||||
chocolate,food,0.76,179,0
|
||||
spider,animal,0.78,179,0
|
||||
chest,container,1.0,175,0
|
||||
evergreen,tree,0.0,174,0
|
||||
mall,building,1.0,173,0
|
||||
onion,vegetable,1.0,171,0
|
||||
mushroom,vegetable,1.0,170,0
|
||||
piano,instrument,0.96,167,0
|
||||
beaver,animal,1.0,166,0
|
||||
shelter,building,0.8,165,0
|
||||
dirt,material,0.75,164,0
|
||||
deer,animal,1.0,162,0
|
||||
barrel,container,1.0,161,0
|
||||
bean,vegetable,1.0,159,0
|
||||
pit,"landscape,seed",1.0,159,0
|
||||
saddle,tool,1.0,157,0
|
||||
bladder,container,0.71,154,0
|
||||
cock,bird,1.0,154,0
|
||||
tank,"container,weapon",1.0,154,0
|
||||
linen,fabric,1.0,152,0
|
||||
oak,"tree,wood",1.0,152,0
|
||||
tomato,fruit,0.75,152,0
|
||||
marble,"rock,stone",0.82,149,0
|
||||
jack,tool,0.83,148,0
|
||||
lion,animal,0.94,148,0
|
||||
bull,animal,1.0,147,0
|
||||
tie,clothing,1.0,147,0
|
||||
straw,"crop,material",0.75,147,0
|
||||
comb,tool,1.0,146,0
|
||||
rifle,weapon,0.95,145,0
|
||||
cannon,weapon,0.97,144,0
|
||||
rat,animal,1.0,141,0
|
||||
hawk,bird,0.0,139,0
|
||||
jacket,clothing,0.87,137,0
|
||||
mole,animal,0.0,135,0
|
||||
candy,food,0.88,134,0
|
||||
cardinal,bird,0.0,134,0
|
||||
drill,tool,1.0,133,0
|
||||
cart,vehicle,1.0,132,0
|
||||
anchor,metal,1.0,130,0
|
||||
salmon,food,0.0,130,0
|
||||
hay,"crop,plant",0.9,129,0
|
||||
vine,plant,1.0,128,0
|
||||
spear,weapon,0.0,124,0
|
||||
ash,material,1.0,123,0
|
||||
cereal,food,0.86,122,0
|
||||
pond,"landscape,water",0.94,122,0
|
||||
ferret,animal,1.0,121,0
|
||||
quartz,"mineral,rock",1.0,121,0
|
||||
barn,building,1.0,120,0
|
||||
bucket,container,1.0,120,0
|
||||
pizza,food,0.9,120,0
|
||||
turtle,animal,0.75,120,0
|
||||
pigeon,bird,1.0,119,0
|
||||
trumpet,instrument,0.96,119,0
|
||||
mortar,weapon,0.93,118,0
|
||||
orchid,flower,1.0,117,0
|
||||
pepper,spice,0.75,117,0
|
||||
mold,organism,1.0,117,0
|
||||
bronze,metal,0.0,116,0
|
||||
wolf,animal,1.0,116,0
|
||||
platinum,metal,0.0,115,0
|
||||
seaweed,plant,0.88,115,0
|
||||
gazelle,animal,1.0,114,0
|
||||
lemon,fruit,0.5,111,0
|
||||
salad,food,0.94,110,0
|
||||
ladder,tool,1.0,108,0
|
||||
lever,tool,1.0,108,0
|
||||
pistol,weapon,0.96,108,0
|
||||
pitcher,container,1.0,108,0
|
||||
banana,food,0.73,107,0
|
||||
bass,"fish,instrument",1.0,107,0
|
||||
cannabis,plant,0.89,107,0
|
||||
pine,"plant,wood",0.0,106,0
|
||||
snail,animal,1.0,106,0
|
||||
wasp,insect,1.0,106,0
|
||||
wedge,tool,1.0,106,0
|
||||
pod,container,1.0,105,0
|
||||
ginger,spice,0.0,104,0
|
||||
pea,vegetable,0.0,104,0
|
||||
moss,plant,1.0,104,0
|
||||
goose,"animal,bird",0.0,103,0
|
||||
underwear,clothing,1.0,102,0
|
||||
chalk,"material,mineral",1.0,102,0
|
||||
pick,tool,1.0,101,0
|
||||
pebble,stone,1.0,98,0
|
||||
lemur,animal,1.0,95,0
|
||||
porch,structure,1.0,94,0
|
||||
meadow,landscape,1.0,94,0
|
||||
flute,instrument,0.95,93,0
|
||||
marmoset,animal,1.0,93,0
|
||||
marmot,animal,1.0,93,0
|
||||
van,vehicle,0.83,93,0
|
||||
iris,"flower,plant",0.5,92,0
|
||||
cork,material,1.0,92,0
|
||||
canoe,vehicle,0.95,91,0
|
||||
rust,material,0.0,91,0
|
||||
blowfish,animal,1.0,90,0
|
||||
harp,instrument,0.91,89,0
|
||||
cabbage,"plant,vegetable",0.83,88,0
|
||||
kite,bird,1.0,88,0
|
||||
tub,container,1.0,88,0
|
||||
ham,food,1.0,86,0
|
||||
meteorite,stone,0.0,85,0
|
||||
soda,beverage,0.82,85,0
|
||||
steak,food,0.8,85,0
|
||||
curry,spice,0.62,84,0
|
||||
mug,container,1.0,83,0
|
||||
violet,flower,0.0,83,0
|
||||
burrow,shelter,1.0,83,0
|
||||
gallery,building,1.0,82,0
|
||||
pike,fish,0.95,82,0
|
||||
ditch,landscape,1.0,82,0
|
||||
granite,stone,0.77,80,0
|
||||
handgun,weapon,1.0,80,0
|
||||
tungsten,metal,1.0,80,0
|
||||
flask,container,1.0,79,0
|
||||
squash,fruit,0.8,78,0
|
||||
gem,stone,0.0,76,0
|
||||
tar,material,0.67,76,0
|
||||
axe,tool,0.67,75,0
|
||||
hamburger,food,0.75,75,0
|
||||
pasture,landscape,0.0,75,0
|
||||
canon,weapon,1.0,74,0
|
||||
scarf,clothing,0.83,74,0
|
||||
crow,bird,1.0,73,0
|
||||
pickle,"food,vegetable",1.0,73,0
|
||||
flint,stone,1.0,72,0
|
||||
owl,bird,1.0,72,0
|
||||
tiger,animal,0.75,72,0
|
||||
lantern,tool,1.0,72,0
|
||||
furrow,landscape,0.0,72,0
|
||||
bamboo,plant,0.75,70,0
|
||||
dolphin,"animal,fish",0.67,70,0
|
||||
creek,"landscape,water",0.94,70,0
|
||||
peach,fruit,1.0,69,0
|
||||
cucumber,vegetable,1.0,67,0
|
||||
lobster,food,1.0,67,0
|
||||
swan,bird,0.0,67,0
|
||||
swift,bird,0.0,66,0
|
||||
chimney,structure,1.0,66,0
|
||||
eel,fish,1.0,65,0
|
||||
shotgun,weapon,0.0,65,0
|
||||
finch,bird,0.0,63,0
|
||||
garnet,stone,0.0,63,0
|
||||
duplex,building,1.0,62,0
|
||||
mandolin,instrument,0.97,62,0
|
||||
slug,food,0.0,62,0
|
||||
plow,tool,1.0,62,0
|
||||
thorn,plant,0.0,62,0
|
||||
husk,plant,0.0,62,0
|
||||
cinnamon,spice,1.0,61,0
|
||||
clarinet,instrument,1.0,61,0
|
||||
dove,bird,0.0,61,0
|
||||
nylon,material,1.0,61,0
|
||||
puppy,animal,0.44,61,0
|
||||
strawberry,fruit,0.0,61,0
|
||||
tractor,vehicle,1.0,61,0
|
||||
herring,fish,1.0,60,0
|
||||
pulley,tool,0.0,60,0
|
||||
accordion,instrument,1.0,59,0
|
||||
gymnasium,building,1.0,59,0
|
||||
sailboat,vehicle,1.0,59,0
|
||||
hummingbird,bird,0.0,58,0
|
||||
den,shelter,1.0,58,0
|
||||
tinamou,bird,0.0,57,0
|
||||
bin,container,1.0,56,0
|
||||
eagle,bird,1.0,56,0
|
||||
gull,bird,0.0,56,0
|
||||
oboe,instrument,1.0,56,0
|
||||
rook,bird,1.0,56,0
|
||||
trout,fish,0.0,56,0
|
||||
burger,food,0.0,55,0
|
||||
revolver,weapon,0.93,55,0
|
||||
kettle,container,1.0,55,0
|
||||
lettuce,"food,plant,vegetable",0.89,54,0
|
||||
ruby,stone,0.0,54,0
|
||||
saxophone,"instrument,tool",1.0,54,0
|
||||
sprout,vegetable,0.0,54,0
|
||||
zebra,animal,1.0,54,0
|
||||
spade,tool,1.0,53,0
|
||||
tortoise,animal,1.0,53,0
|
||||
falcon,bird,0.0,52,0
|
||||
pineapple,fruit,1.0,52,0
|
||||
yoke,tool,1.0,52,0
|
||||
chili,spice,0.0,51,0
|
||||
hearth,structure,0.0,51,0
|
||||
chisel,tool,0.0,50,0
|
||||
harmonica,instrument,0.97,50,0
|
||||
trough,container,0.0,50,0
|
||||
broom,tool,1.0,50,0
|
||||
denim,fabric,1.0,49,0
|
||||
earthworm,animal,0.0,49,0
|
||||
thrush,bird,0.0,49,0
|
||||
vase,container,1.0,49,0
|
||||
xylophone,instrument,1.0,49,0
|
||||
lily,flower,0.0,48,0
|
||||
penguin,animal,0.67,48,0
|
||||
spaghetti,food,1.0,48,0
|
||||
quail,bird,0.0,47,0
|
||||
mica,mineral,0.0,46,0
|
||||
synthesizer,instrument,1.0,46,0
|
||||
cello,instrument,0.94,45,0
|
||||
viola,instrument,1.0,45,0
|
||||
pail,container,1.0,44,0
|
||||
woodpecker,bird,1.0,44,0
|
||||
fig,fruit,0.0,43,0
|
||||
jug,container,1.0,43,0
|
||||
lighter,tool,1.0,43,0
|
||||
nutmeg,spice,0.0,43,0
|
||||
soot,material,0.0,43,0
|
||||
poppy,flower,0.0,42,0
|
||||
greenhouse,building,1.0,41,0
|
||||
polyester,material,0.0,41,0
|
||||
warbler,bird,0.0,41,0
|
||||
celery,vegetable,0.6,40,0
|
||||
rhinoceros,animal,1.0,40,0
|
||||
spinach,vegetable,0.0,40,0
|
||||
acorn,seed,1.0,38,0
|
||||
chimpanzee,animal,1.0,38,0
|
||||
lyre,instrument,0.0,37,0
|
||||
heron,bird,0.0,36,0
|
||||
mango,fruit,1.0,36,0
|
||||
talc,mineral,1.0,36,0
|
||||
vulture,bird,0.0,36,0
|
||||
zither,instrument,0.0,36,0
|
||||
agate,stone,0.0,35,0
|
||||
gorilla,animal,1.0,35,0
|
||||
recorder,instrument,0.0,35,0
|
||||
cuckoo,bird,0.0,34,0
|
||||
eggplant,vegetable,0.0,34,0
|
||||
lark,bird,1.0,34,0
|
||||
minnow,fish,0.0,34,0
|
||||
daisy,flower,1.0,33,0
|
||||
dandelion,flower,0.0,33,0
|
||||
flounder,fish,1.0,33,0
|
||||
pheasant,bird,0.0,33,0
|
||||
stork,bird,0.0,33,0
|
||||
orchard,landscape,1.0,33,0
|
||||
beet,vegetable,0.0,32,0
|
||||
maple,wood,0.0,32,0
|
||||
peacock,bird,1.0,32,0
|
||||
barometer,instrument,1.0,31,0
|
||||
burlap,fabric,1.0,31,0
|
||||
cranberry,fruit,0.0,31,0
|
||||
decanter,container,1.0,31,0
|
||||
mace,weapon,1.0,31,0
|
||||
ostrich,"animal,bird",0.0,31,0
|
||||
anise,spice,0.0,30,0
|
||||
kayak,vehicle,0.0,30,0
|
||||
wren,bird,0.0,30,0
|
||||
anvil,tool,1.0,30,0
|
||||
cicada,insect,0.0,29,0
|
||||
grouse,bird,0.0,29,0
|
||||
parsley,herb,1.0,29,0
|
||||
stethoscope,tool,1.0,29,0
|
||||
weaver,bird,0.0,29,0
|
||||
canary,bird,0.0,28,0
|
||||
firefly,insect,1.0,28,0
|
||||
micrometer,instrument,0.0,28,0
|
||||
turmeric,spice,0.0,28,0
|
||||
cedar,tree,1.0,27,0
|
||||
emerald,stone,0.0,27,0
|
||||
silo,building,0.67,27,0
|
||||
blackbird,bird,0.0,26,0
|
||||
cumin,spice,1.0,26,0
|
||||
earwig,insect,1.0,26,0
|
||||
kiwi,"animal,fruit",1.0,26,0
|
||||
rasp,tool,0.0,26,0
|
||||
robin,bird,1.0,26,0
|
||||
tulip,flower,1.0,26,0
|
||||
thimble,tool,0.0,26,0
|
||||
cauliflower,vegetable,0.0,25,0
|
||||
juniper,plant,0.0,25,0
|
||||
thyme,"herb,spice",0.0,25,0
|
||||
wrapper,material,0.0,25,0
|
||||
carton,container,1.0,24,0
|
||||
goldfish,fish,0.75,24,0
|
||||
holly,tree,0.0,24,0
|
||||
kingfisher,bird,0.0,24,0
|
||||
mahogany,wood,0.0,24,0
|
||||
trowel,tool,1.0,24,0
|
||||
daffodil,flower,0.0,23,0
|
||||
jay,bird,0.0,23,0
|
||||
peppermint,plant,0.0,23,0
|
||||
python,animal,0.0,23,0
|
||||
basil,"herb,spice",1.0,22,0
|
||||
centipede,animal,0.0,22,0
|
||||
partridge,bird,0.0,22,0
|
||||
pewter,metal,0.0,22,0
|
||||
tern,bird,1.0,22,0
|
||||
bellows,tool,0.0,22,0
|
||||
asparagus,vegetable,0.0,21,0
|
||||
awl,tool,0.0,21,0
|
||||
cockatoo,bird,0.0,21,0
|
||||
emu,animal,0.0,21,0
|
||||
hickory,tree,0.0,21,0
|
||||
horsetail,herb,0.0,21,0
|
||||
magpie,bird,0.0,21,0
|
||||
chips,food,0.88,20,0
|
||||
cypress,plant,0.0,20,0
|
||||
dogwood,tree,0.0,20,0
|
||||
puffin,bird,0.0,20,0
|
||||
zucchini,vegetable,1.0,20,0
|
||||
auk,bird,0.0,19,0
|
||||
ukulele,instrument,0.0,19,0
|
||||
coca,tree,0.0,18,0
|
||||
coriander,spice,0.0,18,0
|
||||
poplar,wood,0.0,18,0
|
||||
rhea,"animal,bird",0.0,18,0
|
||||
styrofoam,material,0.67,18,0
|
||||
kale,vegetable,1.0,17,0
|
||||
lilac,flower,0.0,17,0
|
||||
lynx,animal,0.0,17,0
|
||||
nightjar,bird,0.0,17,0
|
||||
oregano,herb,1.0,17,0
|
||||
parakeet,bird,0.0,17,0
|
||||
radish,vegetable,1.0,17,0
|
||||
snapper,fish,0.0,17,0
|
||||
starling,bird,0.0,17,0
|
||||
toucan,bird,0.0,17,0
|
||||
watermelon,"fruit,plant",1.0,17,0
|
||||
cormorant,bird,0.0,16,0
|
||||
dodo,bird,0.0,16,0
|
||||
ginseng,plant,1.0,16,0
|
||||
martin,bird,0.0,16,0
|
||||
petrel,bird,0.0,16,0
|
||||
sardine,fish,0.0,16,0
|
||||
unicycle,vehicle,0.0,16,0
|
||||
coop,building,0.0,16,0
|
||||
bison,animal,0.0,15,0
|
||||
booby,bird,0.0,15,0
|
||||
grouper,fish,0.0,15,0
|
||||
nightingale,"animal,bird",0.0,15,0
|
||||
paprika,spice,0.0,15,0
|
||||
teacup,container,0.0,15,0
|
||||
albatross,bird,0.0,14,0
|
||||
buzzard,bird,0.0,14,0
|
||||
cardamom,spice,0.0,14,0
|
||||
carnation,flower,0.0,14,0
|
||||
cockatiel,bird,0.0,14,0
|
||||
cornbread,food,1.0,14,0
|
||||
flamingo,bird,0.0,14,0
|
||||
hacksaw,tool,0.0,14,0
|
||||
lotus,flower,1.0,14,0
|
||||
papaya,fruit,0.0,14,0
|
||||
tanager,bird,0.0,14,0
|
||||
verbena,plant,0.0,14,0
|
||||
barracuda,fish,0.0,13,0
|
||||
bunting,bird,0.0,13,0
|
||||
caddy,container,0.0,13,0
|
||||
chard,vegetable,0.0,13,0
|
||||
coot,bird,0.0,13,0
|
||||
pelican,bird,0.0,13,0
|
||||
caliper,instrument,1.0,12,0
|
||||
cattail,plant,1.0,12,0
|
||||
flycatcher,bird,0.0,12,0
|
||||
ibis,bird,0.0,12,0
|
||||
kestrel,bird,0.0,12,0
|
||||
nectarine,fruit,0.0,12,0
|
||||
vise,tool,1.0,12,0
|
||||
yew,plant,0.0,12,0
|
||||
boxwood,plant,0.0,11,0
|
||||
grebe,bird,0.0,11,0
|
||||
haircloth,fabric,1.0,11,0
|
||||
opener,tool,0.0,11,0
|
||||
osprey,bird,0.0,11,0
|
||||
peony,flower,0.0,11,0
|
||||
roach,insect,0.0,11,0
|
||||
gnocchi,food,1.0,10,0
|
||||
mealybug,insect,1.0,10,0
|
||||
rhododendron,plant,0.0,10,0
|
||||
bongo,instrument,0.0,9,0
|
||||
gannet,bird,0.0,9,0
|
||||
hibiscus,flower,0.0,9,0
|
||||
hornbill,bird,0.0,9,0
|
||||
housefly,insect,0.0,9,0
|
||||
mallard,bird,0.0,9,0
|
||||
avocet,bird,0.0,8,0
|
||||
cassowary,"animal,bird",0.0,8,0
|
||||
chickadee,bird,0.0,8,0
|
||||
crappie,fish,0.0,8,0
|
||||
moped,vehicle,0.0,8,0
|
||||
okra,vegetable,0.0,8,0
|
||||
planer,tool,0.0,8,0
|
||||
wagtail,bird,0.0,8,0
|
||||
wolfram,metal,0.0,8,0
|
||||
bulbul,bird,0.0,7,0
|
||||
kohlrabi,vegetable,0.0,7,0
|
||||
roadrunner,bird,0.0,7,0
|
||||
thrasher,bird,0.0,7,0
|
||||
barbet,bird,0.0,6,0
|
||||
curassow,bird,0.0,6,0
|
||||
lorikeet,bird,0.0,6,0
|
||||
lovebird,bird,0.0,6,0
|
||||
oriole,bird,0.0,6,0
|
||||
quetzal,bird,0.0,6,0
|
||||
sapsucker,bird,0.0,6,0
|
||||
alabaster,stone,0.0,5,0
|
||||
azalea,plant,0.0,5,0
|
||||
cowbird,bird,0.0,5,0
|
||||
hoatzin,bird,0.0,5,0
|
||||
jointer,tool,0.0,5,0
|
||||
oilcan,container,0.0,5,0
|
||||
orca,animal,0.0,5,0
|
||||
posey,flower,0.0,5,0
|
||||
waxwing,bird,0.0,5,0
|
||||
bobwhite,bird,0.0,4,0
|
||||
cotinga,bird,0.0,4,0
|
||||
jicama,vegetable,0.0,4,0
|
||||
lacebug,insect,1.0,4,0
|
||||
marjoram,herb,0.0,4,0
|
||||
oxpecker,bird,0.0,4,0
|
||||
bowerbird,bird,0.0,3,0
|
||||
condor,bird,0.0,3,0
|
||||
gladiola,flower,0.0,3,0
|
||||
|
59
examples/my_world.json
Normal file
59
examples/my_world.json
Normal file
|
|
@ -0,0 +1,59 @@
|
|||
{
|
||||
"entities": [
|
||||
{
|
||||
"name": "Xorhir",
|
||||
"categories": ["animal", "mount"],
|
||||
"relations": {
|
||||
"AtLocation": ["stable", "plains"],
|
||||
"UsedFor": ["riding", "hauling"],
|
||||
"HasA": ["saddle", "hooves", "thick hide"],
|
||||
"Desires": ["grain", "salt"]
|
||||
},
|
||||
"properties": ["stubborn", "loyal", "large"],
|
||||
"derived_from": ["horse", "ox"]
|
||||
},
|
||||
{
|
||||
"name": "Grushum",
|
||||
"categories": ["plant", "food"],
|
||||
"relations": {
|
||||
"AtLocation": ["field", "garden"],
|
||||
"HasPrerequisite": ["sun", "soil"],
|
||||
"UsedFor": ["feed", "brewing"]
|
||||
},
|
||||
"properties": ["leafy", "bitter", "seasonal"]
|
||||
},
|
||||
{
|
||||
"name": "turtleduck",
|
||||
"categories": ["animal", "bird"],
|
||||
"relations": {
|
||||
"AtLocation": ["pond", "riverbank"]
|
||||
},
|
||||
"properties": ["shy", "armored"],
|
||||
"derived_from": ["turtle", "duck"]
|
||||
},
|
||||
{
|
||||
"name": "ironwood",
|
||||
"categories": ["tree", "material"],
|
||||
"relations": {
|
||||
"AtLocation": ["forest", "mountain"],
|
||||
"UsedFor": ["building", "weapon"],
|
||||
"HasA": ["bark", "root"],
|
||||
"HasProperty": ["hard", "heavy"]
|
||||
},
|
||||
"properties": ["hard", "heavy", "ancient"],
|
||||
"derived_from": ["oak", "iron"]
|
||||
},
|
||||
{
|
||||
"name": "ashberry",
|
||||
"categories": ["plant", "food", "fruit"],
|
||||
"relations": {
|
||||
"AtLocation": ["orchard", "meadow"],
|
||||
"UsedFor": ["eating", "wine"],
|
||||
"MadeOf": ["seed", "juice"],
|
||||
"HasA": ["thorn", "pit"]
|
||||
},
|
||||
"properties": ["tart", "red", "small"],
|
||||
"derived_from": ["berry", "apple"]
|
||||
}
|
||||
]
|
||||
}
|
||||
30
examples/sample_output.txt
Normal file
30
examples/sample_output.txt
Normal file
|
|
@ -0,0 +1,30 @@
|
|||
That's just drawwing the chalk and praying for press clothes.
|
||||
The deer feeds everyone's fear hunters but its own.
|
||||
My daddy always said, can't have eating vegetables without salt.
|
||||
What's a pick but a rasp with ambition?
|
||||
What's a centipede but a kiwi with patience?
|
||||
There's a fella who grabs the forrest and says the leaves on branches's no good.
|
||||
My grandmother used to say, 'sweeping the broom won't bring you punishment.'
|
||||
He picks all the ocean water then wonders why the ocean water looks bare.
|
||||
What's a gull but a jay with an attitude?
|
||||
The plow feeds everyone's farm land but its own.
|
||||
A lynx is just a earthworm that's got feline.
|
||||
The only difference between a handgun and a cannon is a plan.
|
||||
Nobody's got less spin webs than the man who makes the spider.
|
||||
A daffodil is just a iris that's got a plan.
|
||||
My grandmother used to say, 'coverring the haircloth won't bring you kint cardigan or sweater.'
|
||||
Take the lay eggs from a crow and you've got yourself a lorikeet.
|
||||
Funny how the owl never has enough rest during day for itself.
|
||||
Don't raise the stable and act surprised when the leather show up.
|
||||
You can't set up a grass and then wonder where all the gazelle came from.
|
||||
My grandmother used to say, 'building the level won't bring you stirring.'
|
||||
Take the ambition from a zither and you've got yourself a xylophone.
|
||||
Don't make the plastic and act surprised when the box show up.
|
||||
A ukulele is just a scale that's got an attitude.
|
||||
What's a crappie but a trout with patience?
|
||||
A emu is just a ferret that's got walk backwards.
|
||||
Take the an attitude from a denim and you've got yourself a wool.
|
||||
A have children don't come without its stable, now does it?
|
||||
Take the white from a rice and you've got yourself a wheat.
|
||||
What's a roadrunner but a owl with patience?
|
||||
There's a fella who eats the delicatessen and says the delicatessen's no good.
|
||||
717
folksy_generator.py
Normal file
717
folksy_generator.py
Normal file
|
|
@ -0,0 +1,717 @@
|
|||
#!/usr/bin/env python3
|
||||
"""Folksy Idiom Generator — Procedural fake-proverb generator using ConceptNet relationships."""
|
||||
|
||||
import argparse
|
||||
import csv
|
||||
import json
|
||||
import os
|
||||
import random
|
||||
import sys
|
||||
from collections import defaultdict
|
||||
from pathlib import Path
|
||||
|
||||
DATA_DIR = Path(__file__).parent / "data"
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Graph data structures
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class FolksyGraph:
|
||||
"""In-memory graph of folksy vocabulary and their ConceptNet relationships."""
|
||||
|
||||
def __init__(self):
|
||||
self.vocab = {} # word -> {categories, tangibility, edge_count}
|
||||
self.by_category = defaultdict(list) # category -> [words]
|
||||
self.edges = defaultdict(list) # (start, relation) -> [(end, weight, surface)]
|
||||
self.reverse = defaultdict(list) # (end, relation) -> [(start, weight, surface)]
|
||||
self.all_edges = defaultdict(list) # start -> [(end, relation, weight)]
|
||||
self.all_words = []
|
||||
|
||||
def load(self, vocab_path=None, relations_path=None):
|
||||
vocab_path = vocab_path or (DATA_DIR / "folksy_vocab.csv")
|
||||
relations_path = relations_path or (DATA_DIR / "folksy_relations.csv")
|
||||
|
||||
with open(vocab_path, newline="", encoding="utf-8") as f:
|
||||
reader = csv.DictReader(f)
|
||||
for row in reader:
|
||||
word = row["word"]
|
||||
cats = [c.strip() for c in row["categories"].split(",") if c.strip()]
|
||||
self.vocab[word] = {
|
||||
"categories": cats,
|
||||
"tangibility": float(row.get("tangibility_score", 0)),
|
||||
"edge_count": int(row.get("conceptnet_edge_count", 0)),
|
||||
}
|
||||
for cat in cats:
|
||||
self.by_category[cat].append(word)
|
||||
self.all_words = list(self.vocab.keys())
|
||||
|
||||
with open(relations_path, newline="", encoding="utf-8") as f:
|
||||
reader = csv.DictReader(f)
|
||||
for row in reader:
|
||||
sw = row["start_word"]
|
||||
ew = row["end_word"]
|
||||
rel = row["relation"]
|
||||
w = float(row["weight"])
|
||||
surf = row.get("surface_text", "")
|
||||
self.edges[(sw, rel)].append((ew, w, surf))
|
||||
self.reverse[(ew, rel)].append((sw, w, surf))
|
||||
self.all_edges[sw].append((ew, rel, w))
|
||||
self.all_edges[ew].append((sw, rel, w))
|
||||
|
||||
def merge_fictional(self, entities_path):
|
||||
"""Merge fictional entities into the graph."""
|
||||
with open(entities_path, encoding="utf-8") as f:
|
||||
data = json.load(f)
|
||||
|
||||
for entity in data.get("entities", []):
|
||||
name = entity["name"].lower()
|
||||
cats = entity.get("categories", [])
|
||||
props = entity.get("properties", [])
|
||||
|
||||
# Inherit from parents
|
||||
inherited_relations = defaultdict(list)
|
||||
for parent in entity.get("derived_from", []):
|
||||
parent = parent.lower()
|
||||
if parent in self.vocab:
|
||||
parent_cats = self.vocab[parent]["categories"]
|
||||
cats = list(set(cats + parent_cats))
|
||||
# Gather all edges from parent
|
||||
for (sw, rel), targets in list(self.edges.items()):
|
||||
if sw == parent:
|
||||
for (ew, w, surf) in targets:
|
||||
inherited_relations[rel].append((ew, w, ""))
|
||||
for (ew, rel), sources in list(self.reverse.items()):
|
||||
if ew == parent:
|
||||
for (sw, w, surf) in sources:
|
||||
inherited_relations[rel].append((sw, w, ""))
|
||||
|
||||
# Register the entity as a vocab word
|
||||
self.vocab[name] = {
|
||||
"categories": cats,
|
||||
"tangibility": 0.5,
|
||||
"edge_count": 0,
|
||||
}
|
||||
for cat in cats:
|
||||
self.by_category[cat].append(name)
|
||||
self.all_words.append(name)
|
||||
|
||||
# Add inherited relations (lower priority)
|
||||
for rel, targets in inherited_relations.items():
|
||||
for (target, w, surf) in targets:
|
||||
self.edges[(name, rel)].append((target, w, ""))
|
||||
self.reverse[(target, rel)].append((name, w, ""))
|
||||
self.all_edges[name].append((target, rel, w))
|
||||
|
||||
# Add explicit relations (override)
|
||||
for rel, targets in entity.get("relations", {}).items():
|
||||
for target in targets:
|
||||
target_lower = target.lower()
|
||||
self.edges[(name, rel)].append((target_lower, 2.0, ""))
|
||||
self.reverse[(target_lower, rel)].append((name, 2.0, ""))
|
||||
self.all_edges[name].append((target_lower, rel, 2.0))
|
||||
|
||||
# Add properties as HasProperty edges
|
||||
for prop in props:
|
||||
self.edges[(name, "HasProperty")].append((prop.lower(), 2.0, ""))
|
||||
self.all_edges[name].append((prop.lower(), "HasProperty", 2.0))
|
||||
|
||||
def neighbors(self, word, relation=None, min_weight=0.0, vocab_only=False):
|
||||
"""Get neighbors of a word, optionally filtered by relation type.
|
||||
|
||||
Args:
|
||||
vocab_only: If True, only return neighbors that are in the folksy vocab.
|
||||
If False (default), return all neighbors including action
|
||||
phrases, properties, etc.
|
||||
"""
|
||||
if relation:
|
||||
return [(ew, w, s) for (ew, w, s) in self.edges.get((word, relation), [])
|
||||
if w >= min_weight and (not vocab_only or ew in self.vocab)]
|
||||
results = []
|
||||
for (ew, rel, w) in self.all_edges.get(word, []):
|
||||
if w >= min_weight and (not vocab_only or ew in self.vocab):
|
||||
results.append((ew, rel, w))
|
||||
return results
|
||||
|
||||
def vocab_neighbors(self, word, relation=None, min_weight=0.0):
|
||||
"""Get neighbors restricted to folksy vocab words only."""
|
||||
return self.neighbors(word, relation, min_weight, vocab_only=True)
|
||||
|
||||
def two_hop(self, word, rel1, rel2, min_weight=0.5):
|
||||
"""Find 2-hop paths: word -[rel1]-> bridge -[rel2]-> target.
|
||||
|
||||
Bridge can be any word; target must be in folksy vocab.
|
||||
"""
|
||||
results = []
|
||||
for (bridge, w1, _) in self.edges.get((word, rel1), []):
|
||||
for (target, w2, _) in self.edges.get((bridge, rel2), []):
|
||||
if target != word and target in self.vocab and w2 >= min_weight:
|
||||
results.append((bridge, target, w1, w2))
|
||||
return results
|
||||
|
||||
def two_hop_any(self, word, rel1, rel2, min_weight=0.5):
|
||||
"""Find 2-hop paths where target can be any word (not just vocab)."""
|
||||
results = []
|
||||
for (bridge, w1, _) in self.edges.get((word, rel1), []):
|
||||
for (target, w2, _) in self.edges.get((bridge, rel2), []):
|
||||
if target != word and w2 >= min_weight:
|
||||
results.append((bridge, target, w1, w2))
|
||||
return results
|
||||
|
||||
def random_word(self, category=None):
|
||||
"""Pick a random word, optionally from a specific category."""
|
||||
if category and category in self.by_category:
|
||||
pool = self.by_category[category]
|
||||
else:
|
||||
pool = self.all_words
|
||||
return random.choice(pool) if pool else None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Meta-templates
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class MetaTemplate:
|
||||
"""Base class for meta-template families."""
|
||||
|
||||
id = "base"
|
||||
name = "Base Template"
|
||||
surface_templates = []
|
||||
|
||||
def __init__(self, graph):
|
||||
self.graph = graph
|
||||
|
||||
def generate(self, seed_word=None, seed_category=None):
|
||||
"""Attempt to generate a saying. Returns (saying, debug_info) or (None, None)."""
|
||||
raise NotImplementedError
|
||||
|
||||
def _pick_template(self):
|
||||
return random.choice(self.surface_templates)
|
||||
|
||||
def _seed(self, seed_word=None, seed_category=None):
|
||||
if seed_word:
|
||||
return seed_word.lower()
|
||||
return self.graph.random_word(seed_category)
|
||||
|
||||
|
||||
class Deconstruction(MetaTemplate):
|
||||
"""A without B is just humble D."""
|
||||
|
||||
id = "deconstruction"
|
||||
name = "Deconstruction"
|
||||
surface_templates = [
|
||||
"You know what they say, a {A} with no {B} is just a {C} {D}.",
|
||||
"Take the {B} out of {A} and all you've got left is {C} {D}.",
|
||||
"{A} without {B}? That's just {D} with ideas above its station.",
|
||||
"An {A} ain't nothing but {D} that met some {B}.",
|
||||
]
|
||||
|
||||
def generate(self, seed_word=None, seed_category=None):
|
||||
a = self._seed(seed_word, seed_category)
|
||||
if not a:
|
||||
return None, None
|
||||
|
||||
# Find what A is made of / requires
|
||||
ingredients = []
|
||||
for rel in ("MadeOf", "HasPrerequisite", "HasA"):
|
||||
ingredients.extend(_short_concepts(self.graph.neighbors(a, rel, min_weight=0.5)))
|
||||
|
||||
if len(ingredients) < 2:
|
||||
for rel in ("MadeOf", "HasPrerequisite"):
|
||||
for (start, w, s) in self.graph.reverse.get((a, rel), []):
|
||||
if len(start.split("_")) <= 2:
|
||||
ingredients.append((start, w, s))
|
||||
|
||||
if len(ingredients) < 2:
|
||||
return None, None
|
||||
|
||||
random.shuffle(ingredients)
|
||||
b_word = _readable(ingredients[0][0])
|
||||
d_word = _readable(ingredients[1][0])
|
||||
|
||||
# Find a property for D
|
||||
props = self.graph.neighbors(ingredients[1][0], "HasProperty")
|
||||
if props:
|
||||
c_word = _readable(random.choice(props)[0])
|
||||
else:
|
||||
c_word = random.choice(["plain", "sorry", "old", "humble", "dry", "wet", "cold"])
|
||||
|
||||
template = self._pick_template()
|
||||
saying = template.format(A=a, B=b_word, C=c_word, D=d_word)
|
||||
|
||||
debug = {
|
||||
"template_family": self.id,
|
||||
"template": template,
|
||||
"chain": f"{a} MadeOf/Has [{b_word}, {d_word}]; {d_word} HasProperty {c_word}",
|
||||
"slots": {"A": a, "B": b_word, "C": c_word, "D": d_word},
|
||||
}
|
||||
return saying, debug
|
||||
|
||||
|
||||
class DenialOfConsequences(MetaTemplate):
|
||||
"""Don't create conditions for B and deny B."""
|
||||
|
||||
id = "denial_of_consequences"
|
||||
name = "Denial of Consequences"
|
||||
surface_templates = [
|
||||
"Don't {C} the {A} and say you ain't got {B}.",
|
||||
"Don't {C} the {A} and act surprised when the {B} show up.",
|
||||
"Man who {C}s a {A} can't complain about {B}.",
|
||||
"You can't {C} a {A} and then wonder where all the {B} came from.",
|
||||
]
|
||||
|
||||
def generate(self, seed_word=None, seed_category=None):
|
||||
a = self._seed(seed_word, seed_category)
|
||||
if not a:
|
||||
return None, None
|
||||
|
||||
# What is found at A? (reverse: B AtLocation A)
|
||||
attracted = []
|
||||
for (b, w, s) in self.graph.reverse.get((a, "AtLocation"), []):
|
||||
attracted.append((b, w))
|
||||
|
||||
# Also: what does A attract/cause?
|
||||
for rel in ("Causes", "CausesDesire"):
|
||||
for (b, w, s) in self.graph.edges.get((a, rel), []):
|
||||
attracted.append((b, w))
|
||||
|
||||
if not attracted:
|
||||
for (bridge, target, w1, w2) in self.graph.two_hop(a, "UsedFor", "AtLocation"):
|
||||
attracted.append((target, w1 + w2))
|
||||
|
||||
if not attracted:
|
||||
return None, None
|
||||
|
||||
b_word = _readable(random.choice(attracted)[0])
|
||||
|
||||
create_verbs = {
|
||||
"pond": "dig", "birdhouse": "hang", "fence": "build", "trap": "set",
|
||||
"fire": "light", "garden": "plant", "nest": "build", "well": "dig",
|
||||
"bridge": "build", "barn": "raise", "path": "clear", "stable": "raise",
|
||||
"coop": "build", "den": "dig", "ditch": "dig", "furrow": "plow",
|
||||
"orchard": "plant", "hearth": "lay", "chimney": "build",
|
||||
}
|
||||
c_word = create_verbs.get(a)
|
||||
if not c_word:
|
||||
c_word = random.choice(["build", "set up", "put out", "lay down", "make"])
|
||||
|
||||
template = self._pick_template()
|
||||
saying = template.format(A=a, B=b_word, C=c_word)
|
||||
|
||||
debug = {
|
||||
"template_family": self.id,
|
||||
"template": template,
|
||||
"chain": f"{b_word} AtLocation {a}; {a} created by {c_word}",
|
||||
"slots": {"A": a, "B": b_word, "C": c_word},
|
||||
}
|
||||
return saying, debug
|
||||
|
||||
|
||||
class IronicDeficiency(MetaTemplate):
|
||||
"""Producer of X lacks X."""
|
||||
|
||||
id = "ironic_deficiency"
|
||||
name = "Ironic Deficiency"
|
||||
surface_templates = [
|
||||
"The {A}'s {F} always goes without {X}.",
|
||||
"Nobody's got less {X} than the man who makes the {A}.",
|
||||
"Funny how the {A} never has enough {X} for itself.",
|
||||
"The {A} feeds everyone's {X} but its own.",
|
||||
]
|
||||
|
||||
def generate(self, seed_word=None, seed_category=None):
|
||||
a = self._seed(seed_word, seed_category)
|
||||
if not a:
|
||||
return None, None
|
||||
|
||||
products = []
|
||||
for rel in ("UsedFor", "CapableOf", "Causes"):
|
||||
products.extend(self.graph.neighbors(a, rel, min_weight=0.5))
|
||||
|
||||
products = _short_concepts(products)
|
||||
if not products:
|
||||
return None, None
|
||||
|
||||
x_word = _readable(random.choice(products)[0])
|
||||
|
||||
family_members = ["wife", "children", "household", "family", "own kind"]
|
||||
f_word = random.choice(family_members)
|
||||
|
||||
template = self._pick_template()
|
||||
saying = template.format(A=a, X=x_word, F=f_word)
|
||||
|
||||
debug = {
|
||||
"template_family": self.id,
|
||||
"template": template,
|
||||
"chain": f"{a} UsedFor/Produces {x_word}; irony: {a} lacks {x_word}",
|
||||
"slots": {"A": a, "X": x_word, "F": f_word},
|
||||
}
|
||||
return saying, debug
|
||||
|
||||
|
||||
class FutilePreparation(MetaTemplate):
|
||||
"""Like doing A and hoping for unrelated Y."""
|
||||
|
||||
id = "futile_preparation"
|
||||
name = "Futile Preparation"
|
||||
surface_templates = [
|
||||
"Like {A_gerund} and hoping for {Y}.",
|
||||
"That's just {A_gerund} and praying for {Y}.",
|
||||
"My grandmother used to say, '{A_gerund} won't bring you {Y}.'",
|
||||
"You can {A_verb} all you want, it still won't get you {Y}.",
|
||||
]
|
||||
|
||||
def generate(self, seed_word=None, seed_category=None):
|
||||
# Find an action and a desired outcome that are in the same domain but mismatched
|
||||
seed = self._seed(seed_word, seed_category)
|
||||
if not seed:
|
||||
return None, None
|
||||
|
||||
# What is the seed used for?
|
||||
uses = _short_concepts(self.graph.neighbors(seed, "UsedFor", min_weight=0.5), max_words=2)
|
||||
if not uses:
|
||||
return None, None
|
||||
|
||||
action_word = random.choice(uses)[0]
|
||||
|
||||
# Find a different outcome in a related domain via 2-hop
|
||||
outcomes = []
|
||||
for rel in ("Causes", "UsedFor", "HasSubevent"):
|
||||
hops = self.graph.two_hop_any(seed, "AtLocation", rel)
|
||||
outcomes.extend([(_readable(t), w1 + w2) for (_, t, w1, w2) in hops])
|
||||
|
||||
# Also try: things that siblings are UsedFor
|
||||
seed_cats = self.graph.vocab.get(seed, {}).get("categories", [])
|
||||
for cat in seed_cats:
|
||||
siblings = self.graph.by_category.get(cat, [])
|
||||
for sib in random.sample(siblings, min(5, len(siblings))):
|
||||
if sib != seed:
|
||||
for (target, w, s) in self.graph.edges.get((sib, "UsedFor"), []):
|
||||
if target != action_word:
|
||||
outcomes.append((_readable(target), w))
|
||||
|
||||
if not outcomes:
|
||||
return None, None
|
||||
|
||||
y_word = random.choice(outcomes)[0]
|
||||
|
||||
gerund = _gerund(action_word)
|
||||
verb = _readable(action_word)
|
||||
|
||||
template = self._pick_template()
|
||||
saying = template.format(A_gerund=f"{gerund} the {seed}", Y=y_word,
|
||||
A_verb=f"{verb} the {seed}")
|
||||
|
||||
debug = {
|
||||
"template_family": self.id,
|
||||
"template": template,
|
||||
"chain": f"{seed} UsedFor {action_word}; different domain: {y_word}",
|
||||
"slots": {"seed": seed, "action": action_word, "Y": y_word},
|
||||
}
|
||||
return saying, debug
|
||||
|
||||
|
||||
class HypocriticalComplaint(MetaTemplate):
|
||||
"""Consumes X from system Z, complains about remaining Y."""
|
||||
|
||||
id = "hypocritical_complaint"
|
||||
name = "Hypocritical Complaint"
|
||||
surface_templates = [
|
||||
"There's a fella who {verb}s the {X} and says the {Y}'s no good.",
|
||||
"That's like eating the {X} and complaining the {Y} tastes off.",
|
||||
"He picks all the {X} then wonders why the {Y} looks bare.",
|
||||
"Don't {verb} the {X} and then gripe about the {Y}.",
|
||||
]
|
||||
|
||||
def generate(self, seed_word=None, seed_category=None):
|
||||
# Z is the whole, X and Y are parts
|
||||
z = self._seed(seed_word, seed_category)
|
||||
if not z:
|
||||
return None, None
|
||||
|
||||
# Find parts of Z
|
||||
parts = []
|
||||
for rel in ("HasA", "PartOf", "MadeOf"):
|
||||
parts.extend(_short_concepts(self.graph.neighbors(z, rel, min_weight=0.5)))
|
||||
for (start, w, s) in self.graph.reverse.get((z, "PartOf"), []):
|
||||
if len(start.split("_")) <= 2:
|
||||
parts.append((start, w, s))
|
||||
for (start, w, s) in self.graph.reverse.get((z, "HasA"), []):
|
||||
if len(start.split("_")) <= 2:
|
||||
parts.append((start, w, s))
|
||||
|
||||
if len(parts) < 2:
|
||||
return None, None
|
||||
|
||||
random.shuffle(parts)
|
||||
x_word = _readable(parts[0][0])
|
||||
y_word = _readable(parts[1][0])
|
||||
|
||||
consume_verbs = ["eat", "drink", "take", "pick", "use up", "grab"]
|
||||
verb = random.choice(consume_verbs)
|
||||
|
||||
template = self._pick_template()
|
||||
saying = template.format(X=x_word, Y=y_word, verb=verb)
|
||||
|
||||
debug = {
|
||||
"template_family": self.id,
|
||||
"template": template,
|
||||
"chain": f"{x_word} PartOf/HasA {z}; {y_word} PartOf/HasA {z}",
|
||||
"slots": {"Z": z, "X": x_word, "Y": y_word, "verb": verb},
|
||||
}
|
||||
return saying, debug
|
||||
|
||||
|
||||
class TautologicalWisdom(MetaTemplate):
|
||||
"""States obvious causal/prerequisite as wisdom."""
|
||||
|
||||
id = "tautological_wisdom"
|
||||
name = "Tautological Wisdom"
|
||||
surface_templates = [
|
||||
"You know what they say, it takes a {X} to get a {Y}.",
|
||||
"My daddy always said, can't have {Y} without {X}.",
|
||||
"A {Y} don't come without its {X}, now does it?",
|
||||
"You want {Y}? Well, first you're gonna need {X}.",
|
||||
"Ain't no {Y} ever came from nothing — you need {X}.",
|
||||
]
|
||||
|
||||
def generate(self, seed_word=None, seed_category=None):
|
||||
seed = self._seed(seed_word, seed_category)
|
||||
if not seed:
|
||||
return None, None
|
||||
|
||||
# seed HasPrerequisite/Causes something
|
||||
chains = []
|
||||
for (target, w, s) in self.graph.edges.get((seed, "HasPrerequisite"), []):
|
||||
chains.append((_readable(target), seed, w)) # X=prereq, Y=seed
|
||||
for (target, w, s) in self.graph.edges.get((seed, "Causes"), []):
|
||||
chains.append((seed, _readable(target), w)) # X=seed, Y=effect
|
||||
# Also: what does seed require?
|
||||
for (source, w, s) in self.graph.reverse.get((seed, "HasPrerequisite"), []):
|
||||
chains.append((seed, _readable(source), w))
|
||||
|
||||
if not chains:
|
||||
return None, None
|
||||
|
||||
x_word, y_word, _ = random.choice(chains)
|
||||
|
||||
template = self._pick_template()
|
||||
saying = template.format(X=x_word, Y=y_word)
|
||||
|
||||
debug = {
|
||||
"template_family": self.id,
|
||||
"template": template,
|
||||
"chain": f"{x_word} -> {y_word} (prerequisite/cause)",
|
||||
"slots": {"X": x_word, "Y": y_word},
|
||||
}
|
||||
return saying, debug
|
||||
|
||||
|
||||
class FalseEquivalence(MetaTemplate):
|
||||
"""A is just B with/without property P."""
|
||||
|
||||
id = "false_equivalence"
|
||||
name = "False Equivalence"
|
||||
surface_templates = [
|
||||
"A {A} is just a {B} that's got {P}.",
|
||||
"What's a {A} but a {B} with {P}?",
|
||||
"The only difference between a {A} and a {B} is {P}.",
|
||||
"Take the {P} from a {A} and you've got yourself a {B}.",
|
||||
]
|
||||
|
||||
def generate(self, seed_word=None, seed_category=None):
|
||||
a = self._seed(seed_word, seed_category)
|
||||
if not a:
|
||||
return None, None
|
||||
|
||||
a_cats = set(self.graph.vocab.get(a, {}).get("categories", []))
|
||||
if not a_cats:
|
||||
return None, None
|
||||
|
||||
# Find siblings (same category, different word)
|
||||
siblings = []
|
||||
for cat in a_cats:
|
||||
for sib in self.graph.by_category.get(cat, []):
|
||||
if sib != a:
|
||||
siblings.append(sib)
|
||||
|
||||
if not siblings:
|
||||
return None, None
|
||||
|
||||
b_word = random.choice(siblings)
|
||||
|
||||
# Find a property of A that B might lack
|
||||
a_props = _short_concepts(self.graph.neighbors(a, "HasProperty"), max_words=2)
|
||||
b_props = set(p[0] for p in self.graph.neighbors(b_word, "HasProperty"))
|
||||
|
||||
differentiators = [p for p in a_props if p[0] not in b_props]
|
||||
if differentiators:
|
||||
p_word = _readable(random.choice(differentiators)[0])
|
||||
elif a_props:
|
||||
p_word = _readable(random.choice(a_props)[0])
|
||||
else:
|
||||
a_caps = self.graph.neighbors(a, "CapableOf")
|
||||
if a_caps:
|
||||
p_word = _readable(random.choice(a_caps)[0])
|
||||
else:
|
||||
p_word = random.choice(["ambition", "an attitude", "a plan", "patience"])
|
||||
|
||||
template = self._pick_template()
|
||||
saying = template.format(A=a, B=b_word, P=p_word)
|
||||
|
||||
debug = {
|
||||
"template_family": self.id,
|
||||
"template": template,
|
||||
"chain": f"{a} IsA same category as {b_word}; {a} HasProperty {p_word}",
|
||||
"slots": {"A": a, "B": b_word, "P": p_word},
|
||||
}
|
||||
return saying, debug
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def _readable(concept):
|
||||
"""Convert ConceptNet concept to readable form: 'feed_chicken' -> 'feed chicken'."""
|
||||
return concept.replace("_", " ")
|
||||
|
||||
|
||||
def _short_concepts(items, max_words=3):
|
||||
"""Filter concept tuples to only those with short readable names.
|
||||
|
||||
Items can be tuples where first element is the concept string.
|
||||
Returns items where the concept has at most max_words words.
|
||||
"""
|
||||
return [item for item in items if len(item[0].split("_")) <= max_words]
|
||||
|
||||
|
||||
def _gerund(word):
|
||||
"""Rough gerund form of a verb/action word."""
|
||||
word = word.split("_")[0] if "_" in word else word # take first word for compounds
|
||||
if word.endswith("e") and not word.endswith("ee"):
|
||||
return word[:-1] + "ing"
|
||||
if word.endswith("ing"):
|
||||
return word
|
||||
if len(word) > 2 and word[-1] not in "aeiou" and word[-2] in "aeiou" and word[-3] not in "aeiou":
|
||||
return word + word[-1] + "ing"
|
||||
return word + "ing"
|
||||
|
||||
|
||||
def _a(word):
|
||||
"""Add 'a' or 'an' article."""
|
||||
if word and word[0] in "aeiou":
|
||||
return f"an {word}"
|
||||
return f"a {word}"
|
||||
|
||||
|
||||
TEMPLATE_REGISTRY = {
|
||||
"deconstruction": Deconstruction,
|
||||
"denial_of_consequences": DenialOfConsequences,
|
||||
"ironic_deficiency": IronicDeficiency,
|
||||
"futile_preparation": FutilePreparation,
|
||||
"hypocritical_complaint": HypocriticalComplaint,
|
||||
"tautological_wisdom": TautologicalWisdom,
|
||||
"false_equivalence": FalseEquivalence,
|
||||
}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Main generation logic
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def generate_one(graph, template_id=None, seed_word=None, seed_category=None,
|
||||
debug=False, max_retries=20):
|
||||
"""Generate a single folksy saying."""
|
||||
for _ in range(max_retries):
|
||||
if template_id:
|
||||
tid = template_id
|
||||
else:
|
||||
tid = random.choice(list(TEMPLATE_REGISTRY.keys()))
|
||||
|
||||
cls = TEMPLATE_REGISTRY.get(tid)
|
||||
if not cls:
|
||||
print(f"Unknown template: {tid}", file=sys.stderr)
|
||||
return None
|
||||
|
||||
tmpl = cls(graph)
|
||||
saying, dbg = tmpl.generate(seed_word=seed_word, seed_category=seed_category)
|
||||
if saying:
|
||||
if debug:
|
||||
return saying, dbg
|
||||
return saying, None
|
||||
|
||||
return None, None
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Generate folksy fake-proverbs using ConceptNet relationships."
|
||||
)
|
||||
parser.add_argument("--template", "-t", choices=list(TEMPLATE_REGISTRY.keys()),
|
||||
help="Specify a meta-template family")
|
||||
parser.add_argument("--seed", "-s", help="Seed with a specific word")
|
||||
parser.add_argument("--category", "-c", help="Seed with a category (e.g., animal, tool)")
|
||||
parser.add_argument("--entities", "-e", help="Path to fictional entities JSON file")
|
||||
parser.add_argument("--count", "-n", type=int, default=1, help="Number of sayings to generate")
|
||||
parser.add_argument("--output", "-o", help="Output file (default: stdout)")
|
||||
parser.add_argument("--debug", "-d", action="store_true", help="Show relationship chain debug info")
|
||||
parser.add_argument("--vocab", help="Path to folksy_vocab.csv")
|
||||
parser.add_argument("--relations", help="Path to folksy_relations.csv")
|
||||
parser.add_argument("--list-templates", action="store_true", help="List available templates")
|
||||
parser.add_argument("--list-categories", action="store_true", help="List available categories")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if args.list_templates:
|
||||
for tid, cls in TEMPLATE_REGISTRY.items():
|
||||
print(f" {tid:30s} {cls.name}")
|
||||
return
|
||||
|
||||
# Load graph
|
||||
graph = FolksyGraph()
|
||||
try:
|
||||
graph.load(
|
||||
vocab_path=args.vocab or (DATA_DIR / "folksy_vocab.csv"),
|
||||
relations_path=args.relations or (DATA_DIR / "folksy_relations.csv"),
|
||||
)
|
||||
except FileNotFoundError as e:
|
||||
print(f"Error: {e}", file=sys.stderr)
|
||||
print("Run scripts/extract_from_conceptnet.py first to generate data files.", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
if args.list_categories:
|
||||
for cat in sorted(graph.by_category.keys()):
|
||||
print(f" {cat:20s} ({len(graph.by_category[cat])} words)")
|
||||
return
|
||||
|
||||
# Merge fictional entities
|
||||
if args.entities:
|
||||
graph.merge_fictional(args.entities)
|
||||
|
||||
# Generate
|
||||
out = open(args.output, "w", encoding="utf-8") if args.output else sys.stdout
|
||||
try:
|
||||
for i in range(args.count):
|
||||
saying, dbg = generate_one(
|
||||
graph,
|
||||
template_id=args.template,
|
||||
seed_word=args.seed,
|
||||
seed_category=args.category,
|
||||
debug=args.debug,
|
||||
)
|
||||
if saying:
|
||||
out.write(saying + "\n")
|
||||
if args.debug and dbg:
|
||||
out.write(f" [DEBUG] family={dbg['template_family']}\n")
|
||||
out.write(f" [DEBUG] chain: {dbg['chain']}\n")
|
||||
out.write(f" [DEBUG] slots: {dbg['slots']}\n")
|
||||
out.write("\n")
|
||||
else:
|
||||
out.write(f"(failed to generate saying #{i+1} after retries)\n")
|
||||
finally:
|
||||
if args.output:
|
||||
out.close()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
48
schemas/fictional_entities.schema.json
Normal file
48
schemas/fictional_entities.schema.json
Normal file
|
|
@ -0,0 +1,48 @@
|
|||
{
|
||||
"$schema": "http://json-schema.org/draft-07/schema#",
|
||||
"title": "Fictional Entity Registry",
|
||||
"description": "Custom entities for the folksy idiom generator. Each entity declares its category roles and relationships, allowing it to fill template slots alongside real ConceptNet vocabulary.",
|
||||
"type": "object",
|
||||
"required": ["entities"],
|
||||
"properties": {
|
||||
"entities": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "object",
|
||||
"required": ["name", "categories", "relations"],
|
||||
"properties": {
|
||||
"name": {
|
||||
"type": "string",
|
||||
"description": "Display name of the entity (e.g., 'Xorhir', 'turtleduck')"
|
||||
},
|
||||
"categories": {
|
||||
"type": "array",
|
||||
"items": { "type": "string" },
|
||||
"minItems": 1,
|
||||
"description": "Category roles this entity can fill (e.g., ['animal', 'mount'])"
|
||||
},
|
||||
"relations": {
|
||||
"type": "object",
|
||||
"description": "Typed relationships to other entities or real words. Keys are ConceptNet relation types (e.g., 'AtLocation', 'UsedFor', 'HasA', 'MadeOf', 'CapableOf', 'Causes', 'HasPrerequisite', 'PartOf', 'ReceivesAction', 'Desires'). Values are arrays of target words.",
|
||||
"additionalProperties": {
|
||||
"type": "array",
|
||||
"items": { "type": "string" }
|
||||
}
|
||||
},
|
||||
"properties": {
|
||||
"type": "array",
|
||||
"items": { "type": "string" },
|
||||
"description": "Adjectives/properties of this entity (e.g., ['fast', 'stubborn', 'scaly']). Used to fill HasProperty slots in templates."
|
||||
},
|
||||
"derived_from": {
|
||||
"type": "array",
|
||||
"items": { "type": "string" },
|
||||
"description": "Real-world words to inherit relations from (e.g., ['turtle', 'duck'] for a turtleduck). All parent relations are unioned, with this entity's explicit relations taking priority."
|
||||
}
|
||||
},
|
||||
"additionalProperties": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"additionalProperties": false
|
||||
}
|
||||
1234
scripts/classify_proverbs.py
Normal file
1234
scripts/classify_proverbs.py
Normal file
File diff suppressed because it is too large
Load diff
387
scripts/extract_from_conceptnet.py
Normal file
387
scripts/extract_from_conceptnet.py
Normal file
|
|
@ -0,0 +1,387 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
Build folksy vocabulary CSV from ConceptNet5 PostgreSQL database.
|
||||
Steps:
|
||||
1. Gather candidates via IsA categories
|
||||
2. Filter to single-word concrete nouns
|
||||
3. Calculate tangibility score
|
||||
4. Count total edges
|
||||
5. Add manual additions
|
||||
6. Write output CSV
|
||||
"""
|
||||
|
||||
import psycopg2
|
||||
import csv
|
||||
import sys
|
||||
|
||||
DB_NAME = "conceptnet5"
|
||||
|
||||
# IsA categories and their node IDs (pre-looked up)
|
||||
CATEGORY_IDS = {
|
||||
20865: 'animal', 22802: 'beverage', 20866: 'bird', 40218: 'building',
|
||||
21578: 'clothing', 144957: 'container', 26028: 'crop', 148705: 'fabric',
|
||||
22922: 'fish', 26249: 'flower', 22803: 'food', 31187: 'fruit',
|
||||
22610: 'furniture', 114948: 'grain', 641297: 'herb', 152432: 'insect',
|
||||
152437: 'instrument', 153470: 'livestock', 33562: 'material',
|
||||
25893: 'metal', 20869: 'mineral', 20872: 'plant', 37511: 'rock',
|
||||
25753: 'seed', 44101: 'spice', 37357: 'stone', 159255: 'tool',
|
||||
40174: 'tree', 20874: 'vegetable', 144388: 'vehicle', 156331: 'weapon',
|
||||
31507: 'wood'
|
||||
}
|
||||
|
||||
# Relation IDs (pre-looked up from relations table)
|
||||
RELATION_IDS = {
|
||||
'AtLocation': 1, 'MadeOf': 25, 'PartOf': 33, 'UsedFor': 39,
|
||||
'HasA': 15, 'ReceivesAction': 34, 'CreatedBy': 5,
|
||||
'HasProperty': 20, 'Causes': 3, 'MotivatedByGoal': 27,
|
||||
'CausesDesire': 4, 'Desires': 8, 'HasSubevent': 21
|
||||
}
|
||||
|
||||
CONCRETE_RELS = [RELATION_IDS[r] for r in ['AtLocation', 'MadeOf', 'PartOf', 'UsedFor', 'HasA', 'ReceivesAction', 'CreatedBy']]
|
||||
ABSTRACT_RELS = [RELATION_IDS[r] for r in ['HasProperty', 'Causes', 'MotivatedByGoal', 'CausesDesire', 'Desires', 'HasSubevent']]
|
||||
|
||||
MANUAL_ADDITIONS = [
|
||||
'well', 'fence', 'barn', 'creek', 'porch', 'chimney', 'saddle', 'hearth',
|
||||
'kettle', 'plow', 'silo', 'trough', 'yoke', 'anvil', 'bellows', 'thimble',
|
||||
'lantern', 'candle', 'broom', 'bucket', 'ladder', 'rope', 'nail', 'hay',
|
||||
'straw', 'wool', 'leather', 'tar', 'wax', 'cork', 'flint', 'chalk', 'clay',
|
||||
'ash', 'soot', 'rust', 'mold', 'moss', 'bark', 'root', 'stem', 'thorn',
|
||||
'vine', 'husk', 'shell', 'pit', 'den', 'nest', 'burrow', 'coop', 'stable',
|
||||
'pasture', 'meadow', 'orchard', 'garden', 'pond', 'ditch', 'ridge',
|
||||
'hollow', 'furrow'
|
||||
]
|
||||
|
||||
# Common-sense categories for manual additions that might not have IsA edges
|
||||
MANUAL_CATEGORIES = {
|
||||
'well': 'structure', 'fence': 'structure', 'barn': 'building',
|
||||
'creek': 'water,landscape', 'porch': 'structure', 'chimney': 'structure',
|
||||
'saddle': 'tool', 'hearth': 'structure', 'kettle': 'container',
|
||||
'plow': 'tool', 'silo': 'building', 'trough': 'container',
|
||||
'yoke': 'tool', 'anvil': 'tool', 'bellows': 'tool',
|
||||
'thimble': 'tool', 'lantern': 'tool', 'candle': 'tool',
|
||||
'broom': 'tool', 'bucket': 'container', 'ladder': 'tool',
|
||||
'rope': 'material', 'nail': 'tool', 'hay': 'plant,crop',
|
||||
'straw': 'material,crop', 'wool': 'fabric,material',
|
||||
'leather': 'fabric,material', 'tar': 'material', 'wax': 'material',
|
||||
'cork': 'material', 'flint': 'stone', 'chalk': 'material,mineral',
|
||||
'clay': 'material', 'ash': 'material', 'soot': 'material',
|
||||
'rust': 'material', 'mold': 'organism', 'moss': 'plant',
|
||||
'bark': 'plant', 'root': 'plant', 'stem': 'plant',
|
||||
'thorn': 'plant', 'vine': 'plant', 'husk': 'plant',
|
||||
'shell': 'container', 'pit': 'seed,landscape', 'den': 'shelter',
|
||||
'nest': 'shelter', 'burrow': 'shelter', 'coop': 'building',
|
||||
'stable': 'building', 'pasture': 'landscape', 'meadow': 'landscape',
|
||||
'orchard': 'landscape', 'garden': 'landscape', 'pond': 'water,landscape',
|
||||
'ditch': 'landscape', 'ridge': 'landscape', 'hollow': 'landscape',
|
||||
'furrow': 'landscape'
|
||||
}
|
||||
|
||||
# Words to exclude (misspellings, plural forms, overly abstract, non-folksy)
|
||||
EXCLUDE_WORDS = {
|
||||
'bannana', 'brocolli', 'cardimom', 'carary', 'cassorwary', 'cucmber',
|
||||
'cummin', 'dragonsnap', 'elefefant', 'guitare', 'hollie', 'potoato',
|
||||
'rhodedendron', 'sandwitch', 'saphire', 'saxiphone', 'soupd', 'tourqouise',
|
||||
'tiramisu', 'bbq', 'cajun', 'mexican', 'pepsi', 'coke', 'spam', 'accordian',
|
||||
'comealong', 'rooter', 'tweety', 'guru1', 'softball', 'nutdriver',
|
||||
'posessions', 'anus', 'bloodsucker', 'whorehouse', 'cuck',
|
||||
# Plurals when singular exists
|
||||
'blueberries', 'carrots', 'eggs', 'pears', 'peas', 'peaches', 'limes',
|
||||
'raisins', 'plums', 'rubies', 'emeralds', 'shirts', 'shoes', 'tomatoes',
|
||||
'potatoes', 'plastics', 'vegetables', 'animals', 'products', 'vertebrates',
|
||||
'pianos', 'lures', 'pens', 'crampons',
|
||||
# Too technical/non-folksy
|
||||
'bronchoscope', 'dioptometer', 'calibrachoa', 'brachycome', 'diascia',
|
||||
'osteospermum', 'nemesia', 'helichrysum', 'scavola', 'silphium',
|
||||
'cuphea', 'euonymus', 'arborvitae', 'ipomoea', 'bacopa', 'lamium',
|
||||
'falsecypress', 'boottree', 'sedimentary', 'catheter', 'caltrops',
|
||||
'argyranthemum', 'sunn',
|
||||
# Too generic/abstract
|
||||
'creature', 'invertebrate', 'primate', 'marsupial', 'crustacean',
|
||||
'arthropod', 'avian', 'amphibian', 'rodent', 'pet', 'explosive',
|
||||
'automatic', 'percussion', 'woodwind', 'laundry', 'products',
|
||||
# fictional
|
||||
'unicorn', 'dragon', 'pinguin',
|
||||
# remaining misspellings / obscure non-folksy fish
|
||||
'trumbone', 'eidar', 'monchong', 'opakapaka', 'opah', 'cumquat',
|
||||
}
|
||||
|
||||
|
||||
def connect():
|
||||
return psycopg2.connect(dbname=DB_NAME)
|
||||
|
||||
|
||||
def step1_gather_candidates(conn):
|
||||
"""Gather all English base single-word nodes that IsA our categories."""
|
||||
print("Step 1: Gathering IsA candidates...")
|
||||
cur = conn.cursor()
|
||||
|
||||
category_id_list = ','.join(str(k) for k in CATEGORY_IDS.keys())
|
||||
|
||||
cur.execute(f"""
|
||||
SELECT
|
||||
SUBSTRING(n_start.uri FROM 6) AS word,
|
||||
ARRAY_AGG(DISTINCT n_end.id) AS cat_ids
|
||||
FROM edges e
|
||||
JOIN nodes n_start ON e.start_id = n_start.id
|
||||
JOIN nodes n_end ON e.end_id = n_end.id
|
||||
WHERE e.relation_id = 23
|
||||
AND e.weight >= 1.0
|
||||
AND n_start.uri LIKE '/c/en/%'
|
||||
AND n_start.uri NOT LIKE '/c/en/%/%%'
|
||||
AND n_start.uri NOT LIKE '/c/en/%%\\_%%'
|
||||
AND n_end.id IN ({category_id_list})
|
||||
GROUP BY n_start.uri
|
||||
""")
|
||||
|
||||
candidates = {}
|
||||
for word, cat_ids in cur.fetchall():
|
||||
if word.startswith('/'):
|
||||
word = word.lstrip('/')
|
||||
if word in EXCLUDE_WORDS:
|
||||
continue
|
||||
categories = sorted(set(CATEGORY_IDS[cid] for cid in cat_ids if cid in CATEGORY_IDS))
|
||||
candidates[word] = {
|
||||
'categories': categories,
|
||||
'tangibility_score': 0.0,
|
||||
'edge_count': 0
|
||||
}
|
||||
|
||||
cur.close()
|
||||
print(f" Found {len(candidates)} candidates after filtering")
|
||||
return candidates
|
||||
|
||||
|
||||
def step5_add_manual(conn, candidates):
|
||||
"""Add manual additions that aren't already in candidates."""
|
||||
print("Step 5: Adding manual additions...")
|
||||
added = 0
|
||||
for word in MANUAL_ADDITIONS:
|
||||
if word not in candidates:
|
||||
cats = MANUAL_CATEGORIES.get(word, 'misc').split(',')
|
||||
candidates[word] = {
|
||||
'categories': sorted(cats),
|
||||
'tangibility_score': 0.0,
|
||||
'edge_count': 0
|
||||
}
|
||||
added += 1
|
||||
else:
|
||||
# Merge manual categories with existing
|
||||
existing_cats = set(candidates[word]['categories'])
|
||||
manual_cats = set(MANUAL_CATEGORIES.get(word, '').split(',')) - {''}
|
||||
candidates[word]['categories'] = sorted(existing_cats | manual_cats)
|
||||
|
||||
print(f" Added {added} new words from manual list")
|
||||
print(f" Total candidates: {len(candidates)}")
|
||||
return candidates
|
||||
|
||||
|
||||
def step3_4_tangibility_and_edges(conn, candidates):
|
||||
"""Calculate tangibility scores and total edge counts for all candidates."""
|
||||
print("Steps 3-4: Calculating tangibility scores and edge counts...")
|
||||
|
||||
cur = conn.cursor()
|
||||
|
||||
# First, get all node IDs for our candidate words in one query
|
||||
words = list(candidates.keys())
|
||||
uris = [f'/c/en/{w}' for w in words]
|
||||
|
||||
# Batch lookup node IDs
|
||||
cur.execute("""
|
||||
SELECT uri, id FROM nodes
|
||||
WHERE uri = ANY(%s)
|
||||
""", (uris,))
|
||||
|
||||
word_to_node_id = {}
|
||||
for uri, nid in cur.fetchall():
|
||||
word = uri[6:] # strip '/c/en/' (Python 0-indexed: '/c/en/'=6 chars)
|
||||
word_to_node_id[word] = nid
|
||||
|
||||
# Debug: show a sample
|
||||
sample = list(word_to_node_id.items())[:5]
|
||||
print(f" Sample word->id mappings: {sample}")
|
||||
print(f" Found node IDs for {len(word_to_node_id)}/{len(words)} words")
|
||||
|
||||
# Words without node IDs - remove them
|
||||
missing = [w for w in words if w not in word_to_node_id]
|
||||
if missing:
|
||||
print(f" Missing from DB (removing): {missing[:20]}...")
|
||||
for w in missing:
|
||||
del candidates[w]
|
||||
|
||||
if not word_to_node_id:
|
||||
print(" ERROR: No node IDs found!")
|
||||
return candidates
|
||||
|
||||
node_ids = list(word_to_node_id.values())
|
||||
node_id_to_word = {v: k for k, v in word_to_node_id.items()}
|
||||
|
||||
concrete_rel_ids = CONCRETE_RELS
|
||||
abstract_rel_ids = ABSTRACT_RELS
|
||||
all_scored_rels = concrete_rel_ids + abstract_rel_ids
|
||||
|
||||
# Query: for each node (as start or end), count concrete and abstract edges
|
||||
# We need English-only counterparts, so we filter the other end to /c/en/
|
||||
# Do this in batches to avoid memory issues
|
||||
|
||||
batch_size = 200
|
||||
node_id_list = list(node_ids)
|
||||
|
||||
for batch_start in range(0, len(node_id_list), batch_size):
|
||||
batch = node_id_list[batch_start:batch_start + batch_size]
|
||||
batch_words = [node_id_to_word[nid] for nid in batch]
|
||||
|
||||
if batch_start % 1000 == 0:
|
||||
print(f" Processing batch {batch_start}/{len(node_id_list)}...")
|
||||
|
||||
# Concrete relation counts (as start node)
|
||||
cur.execute("""
|
||||
SELECT e.start_id, e.relation_id, COUNT(*)
|
||||
FROM edges e
|
||||
JOIN nodes n_other ON e.end_id = n_other.id
|
||||
WHERE e.start_id = ANY(%s)
|
||||
AND e.weight >= 1.0
|
||||
AND e.relation_id = ANY(%s)
|
||||
AND n_other.uri LIKE '/c/en/%%'
|
||||
GROUP BY e.start_id, e.relation_id
|
||||
""", (batch, all_scored_rels))
|
||||
|
||||
for nid, rel_id, cnt in cur.fetchall():
|
||||
word = node_id_to_word.get(nid)
|
||||
if not word or word not in candidates:
|
||||
continue
|
||||
if rel_id in concrete_rel_ids:
|
||||
candidates[word].setdefault('concrete_count', 0)
|
||||
candidates[word]['concrete_count'] += cnt
|
||||
elif rel_id in abstract_rel_ids:
|
||||
candidates[word].setdefault('abstract_count', 0)
|
||||
candidates[word]['abstract_count'] += cnt
|
||||
|
||||
# As end node
|
||||
cur.execute("""
|
||||
SELECT e.end_id, e.relation_id, COUNT(*)
|
||||
FROM edges e
|
||||
JOIN nodes n_other ON e.start_id = n_other.id
|
||||
WHERE e.end_id = ANY(%s)
|
||||
AND e.weight >= 1.0
|
||||
AND e.relation_id = ANY(%s)
|
||||
AND n_other.uri LIKE '/c/en/%%'
|
||||
GROUP BY e.end_id, e.relation_id
|
||||
""", (batch, all_scored_rels))
|
||||
|
||||
for nid, rel_id, cnt in cur.fetchall():
|
||||
word = node_id_to_word.get(nid)
|
||||
if not word or word not in candidates:
|
||||
continue
|
||||
if rel_id in concrete_rel_ids:
|
||||
candidates[word].setdefault('concrete_count', 0)
|
||||
candidates[word]['concrete_count'] += cnt
|
||||
elif rel_id in abstract_rel_ids:
|
||||
candidates[word].setdefault('abstract_count', 0)
|
||||
candidates[word]['abstract_count'] += cnt
|
||||
|
||||
# Total edge count (any relation, English counterpart, weight >= 1)
|
||||
cur.execute("""
|
||||
SELECT start_id, COUNT(*)
|
||||
FROM edges e
|
||||
JOIN nodes n_other ON e.end_id = n_other.id
|
||||
WHERE e.start_id = ANY(%s)
|
||||
AND e.weight >= 1.0
|
||||
AND n_other.uri LIKE '/c/en/%%'
|
||||
GROUP BY start_id
|
||||
""", (batch,))
|
||||
|
||||
for nid, cnt in cur.fetchall():
|
||||
word = node_id_to_word.get(nid)
|
||||
if word and word in candidates:
|
||||
candidates[word]['edge_count'] += cnt
|
||||
|
||||
cur.execute("""
|
||||
SELECT end_id, COUNT(*)
|
||||
FROM edges e
|
||||
JOIN nodes n_other ON e.start_id = n_other.id
|
||||
WHERE e.end_id = ANY(%s)
|
||||
AND e.weight >= 1.0
|
||||
AND n_other.uri LIKE '/c/en/%%'
|
||||
GROUP BY end_id
|
||||
""", (batch,))
|
||||
|
||||
for nid, cnt in cur.fetchall():
|
||||
word = node_id_to_word.get(nid)
|
||||
if word and word in candidates:
|
||||
candidates[word]['edge_count'] += cnt
|
||||
|
||||
# Calculate tangibility scores
|
||||
for word, data in candidates.items():
|
||||
concrete = data.get('concrete_count', 0)
|
||||
abstract = data.get('abstract_count', 0)
|
||||
total = concrete + abstract
|
||||
if total > 0:
|
||||
data['tangibility_score'] = round(concrete / total, 2)
|
||||
else:
|
||||
data['tangibility_score'] = 0.0
|
||||
|
||||
cur.close()
|
||||
return candidates
|
||||
|
||||
|
||||
def step6_write_output(candidates):
|
||||
"""Write the final CSV."""
|
||||
output_path = '/home/john/Development/folksy-generator/data/folksy_vocab.csv'
|
||||
print(f"Step 6: Writing output to {output_path}")
|
||||
|
||||
# Sort by edge_count descending
|
||||
sorted_words = sorted(candidates.items(), key=lambda x: x[1]['edge_count'], reverse=True)
|
||||
|
||||
with open(output_path, 'w', newline='') as f:
|
||||
writer = csv.writer(f)
|
||||
writer.writerow(['word', 'categories', 'tangibility_score', 'conceptnet_edge_count', 'frequency_rank'])
|
||||
|
||||
for word, data in sorted_words:
|
||||
categories = ','.join(data['categories'])
|
||||
writer.writerow([
|
||||
word,
|
||||
categories,
|
||||
data['tangibility_score'],
|
||||
data['edge_count'],
|
||||
0
|
||||
])
|
||||
|
||||
print(f" Wrote {len(sorted_words)} words")
|
||||
return output_path
|
||||
|
||||
|
||||
def main():
|
||||
conn = connect()
|
||||
try:
|
||||
# Step 1 + 2 (filtering is built into the SQL)
|
||||
candidates = step1_gather_candidates(conn)
|
||||
|
||||
# Step 5: Add manual additions (before scoring so they get scored too)
|
||||
candidates = step5_add_manual(conn, candidates)
|
||||
|
||||
# Steps 3 + 4: Tangibility and edge counts
|
||||
candidates = step3_4_tangibility_and_edges(conn, candidates)
|
||||
|
||||
# Step 6: Write output
|
||||
path = step6_write_output(candidates)
|
||||
|
||||
# Summary stats
|
||||
scores = [d['tangibility_score'] for d in candidates.values() if d['tangibility_score'] > 0]
|
||||
edges = [d['edge_count'] for d in candidates.values()]
|
||||
print(f"\nSummary:")
|
||||
print(f" Total words: {len(candidates)}")
|
||||
print(f" Words with tangibility > 0: {len(scores)}")
|
||||
if scores:
|
||||
print(f" Avg tangibility: {sum(scores)/len(scores):.2f}")
|
||||
if edges:
|
||||
print(f" Avg edge count: {sum(edges)/len(edges):.1f}")
|
||||
print(f" Max edge count: {max(edges)}")
|
||||
print(f" Min edge count: {min(edges)}")
|
||||
print(f" Output: {path}")
|
||||
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
103
scripts/extract_relations.py
Normal file
103
scripts/extract_relations.py
Normal file
|
|
@ -0,0 +1,103 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
Extract ConceptNet relationships between words in the folksy vocabulary.
|
||||
Reads folksy_vocab.csv, queries PostgreSQL conceptnet5 database, and writes
|
||||
folksy_relations.csv with columns: start_word, end_word, relation, weight, surface_text
|
||||
"""
|
||||
|
||||
import csv
|
||||
import psycopg2
|
||||
|
||||
INPUT_PATH = "/home/john/Development/folksy-generator/data/folksy_vocab.csv"
|
||||
OUTPUT_PATH = "/home/john/Development/folksy-generator/data/folksy_relations.csv"
|
||||
|
||||
RELATION_TYPES = [
|
||||
"UsedFor", "AtLocation", "CapableOf", "HasA", "PartOf", "Causes",
|
||||
"CausesDesire", "HasPrerequisite", "ReceivesAction", "Desires",
|
||||
"LocatedNear", "CreatedBy", "MadeOf", "HasProperty", "MotivatedByGoal",
|
||||
"HasSubevent",
|
||||
]
|
||||
|
||||
|
||||
def main():
|
||||
# Step 1: Read the word list from folksy_vocab.csv
|
||||
words = []
|
||||
with open(INPUT_PATH, "r", newline="") as f:
|
||||
reader = csv.DictReader(f)
|
||||
for row in reader:
|
||||
words.append(row["word"].strip())
|
||||
|
||||
print(f"Read {len(words)} words from {INPUT_PATH}")
|
||||
|
||||
# Build node URIs
|
||||
word_uris = [f"/c/en/{w}" for w in words]
|
||||
|
||||
# Build relation URIs
|
||||
relation_uris = [f"/r/{r}" for r in RELATION_TYPES]
|
||||
|
||||
conn = psycopg2.connect(dbname="conceptnet5")
|
||||
cur = conn.cursor()
|
||||
|
||||
# Step 2: Look up all node IDs for these words
|
||||
cur.execute("SELECT id, uri FROM nodes WHERE uri = ANY(%s)", (word_uris,))
|
||||
node_rows = cur.fetchall()
|
||||
uri_to_id = {uri: nid for nid, uri in node_rows}
|
||||
id_to_uri = {nid: uri for nid, uri in node_rows}
|
||||
|
||||
found_words = [uri.replace("/c/en/", "") for uri in uri_to_id]
|
||||
missing_words = set(words) - set(found_words)
|
||||
print(f"Found {len(uri_to_id)} node IDs out of {len(words)} words")
|
||||
if missing_words:
|
||||
print(f"Missing {len(missing_words)} words: {sorted(missing_words)[:20]}...")
|
||||
|
||||
node_ids = list(uri_to_id.values())
|
||||
|
||||
# Step 3: Look up relation IDs
|
||||
cur.execute("SELECT id, uri FROM relations WHERE uri = ANY(%s)", (relation_uris,))
|
||||
rel_rows = cur.fetchall()
|
||||
rel_id_to_name = {rid: uri.replace("/r/", "") for rid, uri in rel_rows}
|
||||
rel_ids = list(rel_id_to_name.keys())
|
||||
|
||||
print(f"Found {len(rel_ids)} relation types: {sorted(rel_id_to_name.values())}")
|
||||
|
||||
# Step 4: Query edges where both start and end are in our folksy node set,
|
||||
# relation is one of our types, and weight >= 1.0
|
||||
cur.execute(
|
||||
"""
|
||||
SELECT e.start_id, e.end_id, e.relation_id, e.weight, e.data->>'surfaceText'
|
||||
FROM edges e
|
||||
WHERE e.start_id = ANY(%s)
|
||||
AND e.end_id = ANY(%s)
|
||||
AND e.relation_id = ANY(%s)
|
||||
AND e.weight >= 1.0
|
||||
ORDER BY e.weight DESC
|
||||
""",
|
||||
(node_ids, node_ids, rel_ids),
|
||||
)
|
||||
|
||||
rows = cur.fetchall()
|
||||
print(f"Found {len(rows)} edges")
|
||||
|
||||
cur.close()
|
||||
conn.close()
|
||||
|
||||
# Step 5: Convert node IDs back to word strings and write CSV
|
||||
results = []
|
||||
for start_id, end_id, relation_id, weight, surface_text in rows:
|
||||
start_word = id_to_uri[start_id].replace("/c/en/", "")
|
||||
end_word = id_to_uri[end_id].replace("/c/en/", "")
|
||||
relation = rel_id_to_name[relation_id]
|
||||
results.append((start_word, end_word, relation, weight, surface_text or ""))
|
||||
|
||||
# Step 6: Write output CSV sorted by weight descending (already sorted by query)
|
||||
with open(OUTPUT_PATH, "w", newline="") as f:
|
||||
writer = csv.writer(f, quoting=csv.QUOTE_ALL)
|
||||
writer.writerow(["start_word", "end_word", "relation", "weight", "surface_text"])
|
||||
for row in results:
|
||||
writer.writerow(row)
|
||||
|
||||
print(f"Wrote {len(results)} relationships to {OUTPUT_PATH}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
Loading…
Add table
Add a link
Reference in a new issue