Carlos Olmos' Approach
I've set up this page to coordinate a project for the translators of the Translation Tool. The project we want to develop is a multilingual topic-related online dictionary. I've named it Translation Tool Lexicon Utility(TTLU). In this page I've collected all the information about it, both the general and the tech. Let's start with the general info:
G1 - Data we'll store, for each word
Notes on G1
- The topic which the word relates to
- A Notes field to store some explanation about the word, the meaning of an acronym or whatever.
- The word in the languages avaible on the database.
- I've supressed the continent subdivision porposed by lupita in her first mail because i think most words will be non continent-specific, so for those which are maybe we can use the Notes field.
- Each word can belong only to one topic
- It will be an special topic case to store words that doesn't fit in any topic.
- The engine will allow us to add languages or topics whenever we need it.
G2 - Accessing the DB info
To access the database information we'll use a webpage with the following items:
- Source and target languages: Listboxes that will allow us to select any of the avaible languages of the database.
- Topic: Another listbox with all the topics of the database
- Text we're looking for: A textbox where we can write the word or text we're looking for, in the source language.
With this information we can query the database and the result will be displayed on the same browser.
Notes on G2
- The target language and the topics lists will include an special case which will allow us to search in all topics and get the translation in all the languages of the database.
- I've planned also to add some advanced search options to allow searching of non exact words, words that begin with, end in or contain some characters. This will be useful if we don't know the correct spelling of a word and also to save the time of typing too long expresions :).
- It will be also an option to get the complete database information or a whole topic, this way we can see which topics are in which language and which of them need translations, etc, etc.
G3 - Upating the database info
This is the less defined point, i've don't work on it yet because it will take a little time to arrive to it. Anyway, the discussion should resolve, at least, the following questions:
Notes on 3
- Who will be able to add info to the database?
- Should be a password required to do so?
- In which format the topics should be added?
- The two first questions are to prevent malicious access to the database, for example adding a 2,479 words topic of shit. I'm actually working on a way to make easy for the database administrators (DBAs) to delete this kind of access, if we get a right solution we can allow all users to add topics without passwods, please make any suggestion you have on this idea, see the section T4 for a first proposal on this.
- The third question is needed because the engine will not accept all the avaible formats, so we have to agree in the one we'll use. I think a good one is csv format. It is a plain text file with the words separated by commas and carry return to begin the new row, for example:
Cessez-le-feu,Cease-fire,Alto el fuego,Cessate il fuoco,Cessar fogo
Conseil de s\xE9curit\xE9,Security council,Consejo de seguridad,Consiglio di sicurezza,Conselho de seguran\xE7a
It is also a common used format so many applications will allow you to export the data to this format (M$ Office Excel and Open Office.org.Calc will allow you for sure) with minimal effort.
For the first times the aministrator/s could add the topics to the database but we'll have to switch the updating process to translators before it becomes too much for the administrator/s.
That's all the general info. It's just a proposal, any comment, correction, suggestion, etc. are welcome and mostly appreciated.
The following is tech related info.
T1 - Chosen 'platforms'
This is the first proposal made by Alster and it's OK for me.
- PHP for the webpage front-end
- MySQL for the database back-end
T2 - Front-end interface
As detailed in the general info section, the front-end will be a dynamic webpage and it will need 2 php files:
- The first php file is the one that shows the ttlu interface as described in G2 and requires a first access to the DB to check if it's avaible and get the information of the avaible languages and topics. Once this information is retrieved, the php file generates the code to show the described form.
- When the user clicks on the submit button, the second php will be executed. The second php has to analyze the options and text introduced by the user and generate the SQL query, send it to MySQL and display the results obtained.
T3 - Database structure
To implement the functionality described i propose a DB with two tables, one for the topics and one for the words.
- The topics table will contain two columns:
- Topic identifier and
- Topic descripction(topic name)
- The words table will have a variable number of columns
- An integer to point to the topic
- A text for the Notes field
- As many text columns as languages supported
T4 - Security
As i've described in the general info section, when we allow ttlu users to add topics to the database it would be useful to have an option to delete erroneous or malicious addings on the database with the minimal time/effort for the administrator. I propose to add a control byte/int to each word an set the same value to all the words of a new inserted topic. If the topic is OK when we check it, we can set the value to a valid topic id and if it's not we can delete all the words with a single DELETE. The value would be incremented for each new topic so each topic has it's own id.
If you have any other idea on how to do this or situacions we should prevent for please add your comment in the page or sent an email to elgrillado at yahoo.com.
T5 - Hosting
Once finished, the project should be hosted in some server with php/mysql avaible. Alster suggested that it should be an indymedia site and so i think, but i have no idea on who should i contact to do such a thing, i hope someone knows. If no one knows i'll try to contact the tech list.
- 04 Nov 2004