SIMPLE Editor Technical Documentation

By Preben wik 15 June 2003

This is an overview of some technical issues concerning the creation of a SIMPLE Database Editor. For information on the editor (manual) please see: Manual.html
Predicative

I feel the most unclear part of the SIMPLE structure is the "Predicate representation". I have not been able to incorporate it into the editor as well as i would have liked. It is as of today possible to replace the predicate of a Semu with an already existing one, but it is not possible to make new ones. There are several reasons for this. First of all, the creation of Predicates are quite complicated, and I have not found an easy and intuitive way to make a GUI for doing it. <To explain the structure: These are the sgml components of the predicate.> We could make GUI entries for all this information, but then who would use them? Secondly, I feel that perhaps using the existing predicates is sufficient. The Danes have made almost a new predicate for every verb, and I suspect that is a mistake. There are generic predicates such as for example:
"PRED2hum_food_CRH_1" (divalent, human subject, food oject) with:
arg1 = Human, semanticrolel="Role_ProtoAgent", status="CHECK"
arg2=ArtifactFood, semanticrolel="Role_ProtoPatient", status="DEFAULTCHECK".
Then there are specialised predicates such as:
"PREDbage_CCS_1", (bake) that apart from the title in the pred_id contains exactly the same information. Details:2PredsSameInfo.txt
I do not understand the reason for doing this, and think that if we were to make a new Semu "grille" (BBQ) it would make no sense to make a new "PREDgrille_CCS_1" and it would look funny to give it the predicate "PREDbage_CCS_1", but it would work just fine to give it the predicate "PRED2hum_food_CRH_1"
If that is the case, perhaps some cleaning up og the existing predicates is needed as well?

This and more, are issues where a desicion must be made by someone other than me.

1.

Selectional restriction = "none" is coded in the data on the surface but not all the way
ex. USEM_V_adlyde_REA_1
has a predicate=PREDagent_REA_1
and <Predicate "PREDagent_REA_1" has an argumentl="ARG1PREDagent_REA_1 ARG2PREDnone_REA_1">
but where
<Argument id="ARG1PREDagent_REA_1" continues with a: informargl="ArgHuman ArgAnimal"
<Argument id="ARG2PREDnone_REA_1" has no informarglist at all.

Hence, the traversal of elements that creates the data structure is broken, and no arguments show up.
To simply leave it empty is not satisfactory because leaving an empty slot in the Arg2 field can be seen as saying: "the verb does not take an Arg2"

There are also other args without informarglist. they are all treated as if their restriction is "none"(they exist in the Predicate table and the argument table but not in the informarg table) I have compiled a list of them which is called: "Args With missing informargl.txt"
The solution used has been to insert an article "informargl= ArgNone" both in the sgml-files for verb and for nouns:
<InformArg
id="ArgNone"
comment="trick for parsing into a database structure"
status=""
weightvalsemfeaturel="WVSFTemplateNonePROT">

and when parsing the sgml files (InsertPredicateNoun1.pl, InsertPredicateVerb1.pl)check:
if (! $informargl) {
push @informargl, "ArgNone";
}
This is perhaps a mistake though. I don't know what the argument structure should be for the args contained in
"Args With missing informargl.txt" Perhaps they should be something else?>

2.

The only way I have found how to tell which argument is ARG1 and which is ARG2 etc. is to look at the arg_id.
There is a pattern that I thought was consistent and that I used to extract the ArgType. For example ARG1PREDagent_REA_1 is saying its argtype is arg1 and its semantic type is agent.
However, some ARGs do not follow the pattern of Arg_id=XXXPREDxxxx. For example: ARG2ændring_CAC_1.
I missed that in the creation of the DB and hence a few of the items in the argument table do not have an arg_type or have a funny arg_type.
Some have an extra P from the arg_id = fx. ARG2PPREDhum_ACT_1 (misspelling?)
Some have unknown types like
"ATR"(the attributive object of the predicate ):ATRPREDnone_SPE_1
"ASSOC (the associate argument of the predicate )"
"APP (the appositional complement of the predicate)".
arg_ASSOCPREDhuman_CNV_1 makes argtype: "arg_ASSOC",
ARG2EPREDmoney_TRA_1 makes argtype: "ARG2E"
etc.

These I don't know how to treat.

3.

CHECK, DEFAULTCHECK and SHADOW are items of the InformArg element, that says if the argument is obligatory, default or implicit. This information as well as Semantic role information is not shown in the editor. If someone find it to be useful information, it could be added.


Qualia:

1. SR uglyness:
The documentation "FinalGuidelines.doc" made by the specification group of SIMPLE (in Italy?) holds in Appendix A-F lists of the items that should be included in SIMPLE
A. QUALIA RELATIONS AND FEATURES, B. DERIVATIONAL RELATIONS, C. SEMANTIC TYPES, D. HIERARCHY OF DOMAINS, E. HIERARCHY OF SEMANTIC CLASSES AND DISTINCTIVE FEATURES, and F. REGULAR POLYSEMOUS CLASSES.
The Sgml structure, however (made in France?) does not follow the guidelines, and I am unsure of which part I should follow. the difference lies partly in the spelling of the relations. For example: In the Telic relations the guidelines will say "Is_the_habit_of" and in the sgml data it says "SRIsthehabitof". This causes the search to come up with 0 instances found instead of 41 if you write: "Is_the_habit_of" taken from the menubutton, rather than "SRIsthehabitof".
This inconsistency was not clear when the data was parsed and placed in the database, and something should be done about it. Perhaps the easiest is to edit the text files in the 'content' folder that is used by the menubuttons to create their structure. this will also avoid the problem that would arise from the confusion between the wildcard "_" and the "_" contained in some of the reation names. There is also a related issue about the prefixed "SR" that the items in the database contains. Should they be removed from all the items in the DB?

searching in the current implementation requires the user to edit the string selected from the menubutton. In most cases, add SR to the front, and remove any underscores from the word.

Polysemy inconsistency:
The Polysemy list is incomplete compared with actual data found in the Sgml structure.
I have found some, but not many in other qualia relations (but there might several there as well). ex. SREntail = constitutive found in the sgml file but not in the guidelines.
All qualia relation names, as well as the constitutive features should be extracted and compared with the lists taken from the FinalGuidelines.doc If additional items are found in the sgml data they should perhaps be added to the list?

2. Unknown Types
Some qualia relations have been given the Type "unknown!!" and are therefore not showing up in the database.

In the Sgml files the qualia relations are coded as <RSemu elements. For example
<RSemU
id="SRArtifactualagentive"
naming="Artifactualagentive"
example=" "
comment="Formal node in the hierarchy"
isal="SRAgentive"
type="PARADIGMATIC">

This says that "SRArtifactualagentive" is of the type "SRAgentive" (from the isal=). Although I have not found it mentioned in the guidelines there seem to be a hierarchy in some of these relations as well. For example:"SRRelatedto" is of type "SRArtifactualagentive"
<RSemU
id="SRRelatedto"
naming="Relatedto"
example=""
comment=""
isal="SRArtifactualagentive"
type="PARADIGMATIC">

While parsing the sgml-files to create the database structure, the types have been identified by using a recursive procedure (sub isPartOf) in the "CreateRwvSemuNounQuery.pl" and "CreateRwvSemuVerbQuery1.pl" files. If it has not found a top item "unknown!!" has been inserted. ( id="SRFormal" has a isal="SRTop", and so does Telic, Agentive a.s.o)

It turns out
<RSemUs
id="SRHasasproperty",id="SRMeasuredby", id="SRMetaphor", id="SRSynonym",and id="SRQuantifies" do not have an isal= entry!
50 some entries in the database must be edited manually for these relations to show up. (I suspect this would affect the people using the sgml data directly as well)


Double Semu_id

Although a Semu_id is supposed to be a unique identifier, this is not always the case. I have compiled a list of Danish doubles: DoubleDanishSemu_Ids.txt. In additon to this there are several, I don't know how many, double Norwegian Semu_ids.
In the initial Sgml files, a Norwegian translation and example were added inside the existing Danish Semu, like this:

<SemU
id="USEM_V_bevæge_sig_MOV_1"
idN=""
naming="bevæge"
namingN="bevege"
example="mens vi venter - stort set forgæves - på , at poeten skal bevæge sig hen for at åbne vinduet "
exampleN="Kartha, som er tidligere jagerflyger, sa at det kasakhstanske flyet ser ut til å ha beveget seg vekk fra den oppgitte kursen"
comment="full BC 201043075 BSP"
commentN=""
....>
In the database implementation the two were to be separated (but still linked together with the fields LinkNorsk and LinkDansk) and a Semu_id was created automatically using the algorithm "take the original danish Semu_id, remove the Danish naming part, and replace it with Norwegian naming". I discovered later that it had the unexpected side effect of creating double Semu_ids in some cases. Sometimes two Danish Semus have been given the same Norwegian translation. for example: Danish id="USEM_N_Atlanten_3DL_1" and id="USEM_N_Atlanterhavet_3DL_1", were both translated into "Atlanterhavet" and thus given the Norwegian Semu_id "N_USEM_N_Atlanterhavet_3DL_1". The same goes for Danish Tyveknægt and Tyv, Datamaskin, Svømmebaseng, hav etc. This should be sorted out as it creates some havoc in the data structure. (doubles of qualia etc.)

Ideally I feel the Semu_id should perhaps be "depreciated" and replaced with a proper auto incremented integer ID anyway.

Norwegian Dummies

Dummies are Semus with incomplete, or no semantic information in them. They are created to have something for the qualia relations to point to if they are referring to a non existent Semu. There will always be during the creation of a network, an outer rim that points to nothing, a dead end, or - a dummy. Say for example you are creating the Semu "Hotel" and when filling out the qualia information "Isa <Building> find that Building is not yet in the dictionary. Instead of first making a complete Semu "Building" and finding when you come to "Isa...that you must fill out yet another Semu "Location" etc., The solution used has been to make a dummy with name and possibly some more information, but with no pointers further.

Norwegian dummies has been added (translated) automatically, although many did not have a Norwegian translation. This means they are clones of the Danish dummies, with a Danish Naming, but with the language tag and Semu_id changed (given the prefix ND_). The reason for doing this is to try to get a complete and closed Norwegian semantic network first, and then let someone at a later stage change name and add information to the dummies.

Semus With Wrong Word Class

The Wordclass field was generated automatically, from the naive presupposition that any semu in the verb_file.sgml was a verb etc. This is not true however, and several semus (particularily dummies) are labeled with the wrong wordclass.
(Must be edited manually.)

Wrong Linking

Although an attempt has been made to relink all Norwegian qualia relations that initially pointed to Danish Semus, this has only been successful in the cases were a Norwegian translation has been made. Hence a number of links are still pointing from the Norwegian database over to the Danish. Look at for example: slakter - isa "person" , Pattegris - isa "gris"

I have seen some examples of qualia relations that are looping For example:
dataskjerm isa skjerm - isa gjenstand isa enhet isa gjenstand...isa enhet...
Or: Bygning isa bygning...Perhaps a proc that checks for loops would be good to straighten up things?

Various Yet To Do: (Comm: most has been done as of dec 03)

This list is of course incomplete, but I will mention here some of the things I see that would improve the Editor, and that there has not been time to do. It is not a top to bottom prioritized list, but the things at the very top are things that I feel needs to be done in order to have a useful editor.

  1. An big job is to translate the remaining 4000 Semus from Danish to Norwegian. A way to get all Danish Semus that are not translated yet into the results list would help in this process. Suggestions: collect all Semus that has no LinkNorsk and Language = Dansk. perform automatic cloning to Norwegian and add a statement "Dummy" in the comments field. This way a lexicographer only needs to change the Naming field in the existing Semu. Alternatively, A button in the Search Tab that adds the statement "AND that has no linkNorsk" to the SQL query. That way all the search capabilities can still be applied, with the additional button added for this temporary job.

  2. Bokmålsorboka. There are many Semus that only has one option in the bokmåls definition table. They could be inserted automatically. Small job for a programmer - big job for a lexicographer?

  3. What about the semus that are not mentioned in BO?
    There Should be a way to manually insert word definitions. But what then about the links to BO?
    suggestion: create an extra field in the Bokmåls table "DEFINISJONSDEL" that is blank (NULL) unless the definition is created for SIMPLE. Perhaps also a field for why. I can see two reasons: the word does not exist in bokmålsordboka, or none of the definitions on that word fits with the word sense described in the SIMPLE lexicon.

  4. "sub populate" not complete yet. "%" on arguments doesn't work for example. To make things worse, the list of arguments is only available from "edit pred.rep dialog" - awkward. Some other fields in the Search Tab not searchable yet.(date,linkNorsk,LinkDansk) Tedious (but easy) job to do.

  5. There is no way to edit the LinkNorsk elements! Must be available.

  6. The "Sort" procedure for the results list is not done. By pressing the buttons in the header of the list the items in the list should be sorted by that category.

  7. Update changes in the results_list. If you change for example Wordclass on a Semu, the change will not appear in the results list until next time it appears. SaveSemu mustdelete the item from the resultsList, and then insert it again (preferrably on the same row)

  8. Qualia edit:MakeNewDummy should automatically show up in the search list, and be selected so that it's Semu_id will appear in the Entry field - ready for the user to press "Add" (That is the most likely reason why someone would want to make a new dummy Semu) At the moment one must make a new search for it to show up.

  9. Delete Semu: "Foreign key Delete check". When you delete a Semu it could be that some other Semu has a link to the one you are deleting. There is no check or protection against this at the moment. I think all it takes is the same thing as what the "ShowAllLinksToThis" proc is doing: (search the RWeightValSemu table for the Semu_id in the Target field).

  10. Add the missing items in the qualia menubuttons (see the qualia note)

  11. Remove all the "SR" in front of the Semr relations in the RWeightValSemu Table? (see the qualia note)

  12. Export list/ Save list as file: does not give you an option to decide which fields should be part of the list.

  13. Bind stuff: There is quite a lot of <Bind> keyboard shortcuts that could be improved. Tab between NoteBook pages, focusing on entries/menubuttons, opening of menubuttons, etc.

  14. Search for "NOT xxx" possibility in the search Tab would be nice...

  15. Cascade MenuButtons should auto-popup on the same entry that is written in the entry Field. a lot of tedious opening up of menubutton pages could be saved.

  16. Accelerator menubuttons. The numbers to the right of the names in the cascading menubuttons are showing the place the item holds in the hierarchy. The idea was to be able to use them as keyboard shortcuts for fast access to common places in the menubutton structure. - Never got around to make it work.

  17. Have the anchor (gray line) of selected item in the results list remain selected after "Enter" is hit (to open qualia edit window etc.)

  18. PredDialog: To be able to make new predicates, and get more information from the pred.Rep/Selectional restrictions. There is more in the sgml that what is visible in the editor. for ex. Semantic role, and Check -Defaultcheck info. (see the predicate note)

  19. Export back to sgml. There is currently no way to get the stuff back out on an sgml form again.

  20. <bind>Alt+back-arrow like in a Web browser back one step, last edited Semu in the results list. Happens often you are editing something, finds a link you need to do something with, and then you wish to return to what you were editing before. Now you're lost.

  21. Robustness: If packages not installed, no warning, things just won't work.

  22. Hourglass -busy signal- when it's busy. Sometimes hard to tell in a search, whether the machine is working or is ready -found nothing.

  23. Make table structure of content files (appendixA-F) for menubuttons, rework the menubuttons?

  24. There is no List for UnificationPath, which means there is no way to edit the UnificationPath (but who will?) I belive it is possible to extract a list from the back part of the templates.txt file that makes the type list. But to be honest I'm not really sure what it should look like.

  25. Semu_id should end up "depreciated" Links to Id instead. Faster to search, automatically incremented.

  26. Log table for Update, delete and create Semu changes being made in the database.

  27. The table Semu has a field called "PredId" whereas the table Predikat has a field called "Pred_id". Should have changed one of them, but by the time it was discovered several queries had already been created. Never got around to change it

Known "bugs"

"Edit" from Right-clicking qualia in the edit Tab doesn't work if the qualia relation Semu is already in the results list.
Hlist will not place a duplicate in the list, and so an error occurs before showTemplate kicks in.

The same goes for "showAllLinksToThis". Clear Results List first, otherwise it is sometimes hard to see if any results come up. Needs additional proc to check if item is already in the results list.

Pressing Ctrl+s (save) when in a Text widged adds a funny square in the Text field. Must "unbind" ctrl+s from the Text widget. One should also unbind 'Tab' from the text widgets, so it does not interfear with traversal between widgets. At this time only entry Naming, example and comment are text widgets. Naming and comment could just as well be Entry widgets since the information never will span more than one line.

If you want to make a new Norwegian Semu based on a Danish Semu, "MakeNewSemuBasedOnThis" on a Danish Semu, will create a Norwegian Semu but with Danish Qualia links.