En le Rongeant

05.21.2004

 

Here we gnaw on Universe and discuss the taste.

 

En le Rongeant - (posted 05.21.2004) Gnawing Text, how the web bots work (up to the interpretation part) with example text

Recently I have been contacted by a couple of academically based organizations with interests in the ALTA/Web bot technology. In both cases, collaboration was offered, but the expressions were so vague as to be meaningless, and the offers, having no clear benefit for this side of the equation, were declined. The reason that this is pertinent has to do with the evolution of the bot processing and the nature of the tasks involved.

The bot technology is a royal pain in the brain to run. In responding to one of the queries from academe', I mentioned that there were in excess of 150 execute-ables involved, none of which had anything close to a human interface, as most are run in prolog in console mode and thus are merely dialogues of predicate names, meaningless to humans. This triggered some thinking, and a quick investigation. There are actually 207 programs which are run in a typical processing and a further 120 which may be run periodically as maintenance on either the code base or the lexicon or the emotive valance database predicates.

Prolog, for those not aware of it, is a very sophisticated programming language which maintains its own database in memory during execution. Further, the predicate calculus underlying SQL is but a small fraction of what is built into prolog as a language.

However, back to the point, which was that having evolved the code over a number of years, and having no error handling built-in, well, almost none, just a few cryptic number references or predicate names to jog the mind, and absolutely no human interface (all the c and c++ code is run in a DOS box or in background mode in protected threads on the server), the task of collaboration is too daunting for words.

It is terribly difficult to explain the details of the code process to anyone, even other programmers, but actually, conceptually easier to explain the 'whats' of the process as long as the 'hows' of the program code is left out of the discussion.

This is a basic primer of the concepts involved in the bot processing. Many of these derive from sked analysis techniques. A sample of board talk is presented immediately below:

Do you have any predictions about the future?

--------------------------------------------------------------------------------
Moms View Message Board: The Kitchen Table (Debating Board): Do you have any predictions about the future? By Mommmie on Sunday, March 21, 2004 - 06:20 pm:

I heard on the news about California public schools getting rid of sports from their schools, I think due to budget problems. I predict more schools will have to do this.

I also predict a lot of people who bought homes in the last few years with the low interest rates will lose them within 10 years bec they bought homes beyond their means.

I think Michael Jackson will be convicted of child abuse.

Does anyone else have any predictions about the future they'd like to share?

By Dana on Sunday, March 21, 2004 - 08:57 pm:


I definitely believe many people will be losing their homes. Upkeep on a home is so expensive. It is already happening in several states.

Other than that, I really have no ideas.

By Dawnk777 on Sunday, March 21, 2004 - 09:00 pm:


That's why we bought a small house 7-1/2 years ago!

By Fionadeassis on Sunday, March 21, 2004 - 11:54 pm:


Please don't scare me on the house buying thing seeing as we just bought our first house 3 weeks ago. The mortgage will be about $100 more than the rent we pay right now. Would you call that way beyond our means? Now I am freaked out a bit!

fiona

By Ladypeacek on Monday, March 22, 2004 - 04:16 am:


Well with my dh in the military we have decided to wait on house buying for as long as we can. I do predict that alot of ugly is coming though. I think with the aids crisis seeming to get out of control that people who are parents of young children now will have a tough time keeping our children from being exposed! I heard that in 20 years 1 in 3 adults that are children now will have aids. Well since i have 4 kids ( 2 of my own and 2 step), that scares the mess out of me! One of my mothers best friends had aids. he was so wonderful to our family. He contracted it many years ago when it first became known. It was so sad to see his struggles. Everytime he came to visit he seemed worse though he tried to hide it. I can't imagine one of my children going through that!

By Ginny~moderator on Monday, March 22, 2004 - 06:10 am:


Kenna, I suspect that the "1 in 3 adults" number includes everyone in the world, including South Africa and Asia, where AIDS is truly rampant.

By Bea on Monday, March 22, 2004 - 05:04 pm:


I understand that fear Kenna, I can remember giving my sons a talk about AIDS when we lived in the DC suburbs. At that time one in five prostitutes in DC were infected. I tried to warn them that all it took was for one member of their social group to try a DC hooker and then sleep with a school mate. They could all become infected. I doubted that I could keep them celibate until marriage, but I sure wanted them to know and practice safe sex, if they were going to be sexually active.

I won't label this a prediction, but rather a fear that this country is a civilization in the process of self destructing, much like the Greeks, the Romans and others throughout history.

By Bobbie on Monday, March 22, 2004 - 09:28 pm:


A lot of the houses in our area are being taken back by the banks. People buy them not looking at the big picture. They just jump in with both feet.

And I agree about the self destruction.. Have major concerns about the kids that are being brought up being in control one day.

By Mommmie on Tuesday, March 23, 2004 - 10:11 am:


Re: the house buying thing, this is what I see in the people I know in real life who bought houses since the interest rates went way down:

-the families live paycheck to paycheck before they bought
-they put nothing down on the house
-they put nothing down on the house bec they have no savings
-they spend more than they planned on
-they charged top of the line appliances for their starter home
-they charged entire rooms of furniture, but they don't have enough cash to pay to have the backyard sodded bec the builder only included the front yard
-they have no money for plumbing repairs and other upkeep (and things go wrong from the beginning)
-they charged lawn equipment and all kinds of other fun stuff at Home Depot

and then what happens...

one of them loses their job, the mom gets a very strong pull to be a SAHM and quit her job, someone gets sick, there's another unexpected pregnancy, the marriage crumbles due to money arguments, etc. Something happens and there's no savings and HUGE bills.

By Momoffour on Tuesday, March 23, 2004 - 11:37 am:


We bought our home last year and we didn't get the interest rate that everyone is talking about ours is on the higher side do to our credit. We didn't have to put alot of money down. the owner paid all closing costs. Which helped out us alot. WE didn't go hawg wild and buy all new stuff for our new home. We are doing that during tax time We are also a one family income I am a sahm and was before we bought our home. as for the future I see my family still living in our home. I do not envy my kids having to grow up during these times but we take it as it comes.

By Dawnk777 on Tuesday, March 23, 2004 - 02:34 pm:


No new furniture for me when we moved. We have extra couches in our house because my parents moved to a house with a smaller living room! LOL! Otherwise, I have had the same couches that I bought when I was single, back in 1984! Ugh! I hate them, too. I'm so ready to see different furniture. DH would have never let me buy top-of-the-line appliances, either. They are way too expensive.

By Momaroze on Wednesday, April 14, 2004 - 10:42 am:


We have our own home too. We bought cheap. Less than renting that is for sure. It is almost payed off. We moved to the country....got away from the city where it is safer to raise children. I don't have a very good outlook on living in big city centres. I like the fact we can have animals and be self sufficient. We have our own water well...and lots of acreage. I feel secure here. As far as the future goes....it does not look very pretty to me. Even alot of the kids these days seem heartless and enjoy inflicting pain on others...I don't know what is going to happen...Boy I must be Pms ing today! LOL.

The sample text above is typical of some brought back by the gathering agent part of the process. This chunk of text represents one 'read'. This along with millions of others would be gnawed for information by a number of prolog processing threads in the next phase. Here we will take a brief tour of some of the information to be extracted from text such as this; information that is not usually discerned by human readers.

We note first, that while the bot brings back idenifiers, our further processing does not record nor care who says what. The programs do note references to time and date for posting as these are used in the base set identifiers for the lexicon. In reading the example above, we will bear in mind how the date advances along with the language changes.

Second we note that the context (from the sked = subject knowledge elucidates the domain = knowledge was originally 'context' in the latin = sced) of the board is one of general discussion:

Moms View Message Board: The Kitchen Table (Debating Board): Do you have any predictions about the future?

We also note, that a query ('Do you have any') starts the discussion. Further, the query has a primary aspect (predictions) which has an attribute (about the future) as a modifier. This latter is very pertinent as our bot processing infers several things from this. First, that there is an anxiety value that is higher than background. [People tend to seek predictions about the future when experiencing an 'unease' in their personal relationship/view of the future at large. They seek re-assurance, a sign of anxiety.] Second we note that this is *not* a predictions board, nor a section within a general discussion board dedicated to predictions. Therefore it is infered that the anxiety quotient is higher still, as the anxiety has prompted the off-topic request. So this contextual analysis of the 'read' first establishes a base level of emotive values for what follows.

I heard on the news about California public schools getting rid of sports from their schools, I think due to budget problems. I predict more schools will have to do this.

I also predict a lot of people who bought homes in the last few years with the low interest rates will lose them within 10 years bec they bought homes beyond their means.

I think Michael Jackson will be convicted of child abuse.

Does anyone else have any predictions about the future they'd like to share?

The same level of processing, in reading the further conversations, will catch a series of emotive clues. The first is well illustrated within the first opening dialogue of the conversation. We find within this a series of tense shifts (time perspective shifts) that begin with 'I heard' (past focused) followed by present 'I think' and that immediately followed by future 'I predict'. The processing will also count the number of shifts in tense relative to the number of sentences or segments (nearly complete thoughts as frequently humans are sloppy about punctuation within internet postings). This will give a ratio that can be used to suggest emotive dissonance or harmony depending. In the case above, we find past, present, future, future, present, future-query. In re-reading the whole example, note how the tenses settle down within the discussion finally to mostly past focused words. This is also noted as an anxiety related linguistic feature based on which words were being used, the fact that they were person focused (we did, we have...because, we bought) directed the values toward 'nostalgia' which is frequently associated with anxiety ('rather look back fondly, than forward trepidaciously'). Again, this would be counted, ratio calculated, and values 'pegged' to this data set indicating a base-line of emotive context.

The next layer of processing will determine, that in spite of the context being stated as 'predictions' and its location as being 'off-topic', that the real object of this discussion thread is not predictions, but rather is or becomes 'housing'. This is an example of serendipitous mode processing when repetion of nouns or nomative structures is allowed/encouraged to form components or entities. If the processing was being run in 'seek' mode, then this conversation might still have been used if the entities sought had a housing component. Such an entity might be a general economic entity, could have been banking, or any number of other connections that can be found within this discussion. Note some of the references:

The mortgage will be

beyond our means

house buying

being taken back by the banks

buy them not looking

house buying thing

interest rates went way down

live paycheck to paycheck

nothing down

no savings

spend more

than they planned on

don't have enough

no money

repairs and other upkeep

And others that are just as easily discovered with this conversation. We further note that this conversation took place over days, yet, when read for emotive modifiers we find a consistent emergence of negative emotive words. As seen in the small extract above, and in all the other conversations. So once the 'nomative' process has gone through and determined that this conversation/read is to be noted as referencing 'Housing', within the larger entity of 'Economics', we have an emotive value analysis program group which reads through, and determines such things as base, basic, and developing emotive 'tone'.

In the case of the example provided here, we find that the processing would be led smoothly to the conclusion that the base and basic tone of the conversation is negative, and that the tone contains some significant negative associations which will boost its value beyond the mere 'grumble/anxiety' background. Why is that? Well, here, within the over all context of a general discussion board, we find a query about the future which leads to an overwhelmingly weighted (total number of words devoted, and higher emotive values) object of 'housing' that also has a significant portion/connection to disease...the conversation about aids 'appearing' within the context of a 'housing' focused conversation.

Well with my dh in the military we have decided to wait on house buying for as long as we can. I do predict that alot of ugly is coming though. I think with the aids crisis seeming to get out of control that people who are parents of young children now will have a tough time keeping our children from being exposed! I heard that in 20 years 1 in 3 adults that are children now will have aids. Well since i have 4 kids ( 2 of my own and 2 step), that scares the mess out of me! One of my mothers best friends had aids. he was so wonderful to our family. He contracted it many years ago when it first became known. It was so sad to see his struggles. Everytime he came to visit he seemed worse though he tried to hide it. I can't imagine one of my children going through that!

By Ginny~moderator on Monday, March 22, 2004 - 06:10 am:


Kenna, I suspect that the "1 in 3 adults" number includes everyone in the world, including South Africa and Asia, where AIDS is truly rampant.

By Bea on Monday, March 22, 2004 - 05:04 pm:


I understand that fear Kenna, I can remember giving my sons a talk about AIDS when we lived in the DC suburbs. At that time one in five prostitutes in DC were infected. I tried to warn them that all it took was for one member of their social group to try a DC hooker and then sleep with a school mate. They could all become infected. I doubted that I could keep them celibate until marriage, but I sure wanted them to know and practice safe sex, if they were going to be sexually active.

Further, we can note that while there are a number of past tense distinguishing words within the conversation about disease, the distance placed is minimal, which implies that the 'tone' of the conversation within the thread is not very modified, which allows us to infer that the original point of appearance of the disease attribute is directly linked to the initial emotive statement...

for as long as we can

alot of ugly is coming

crisis

to get out of control

tough time keeping

exposed

scares the mess out of me

was so sad

 

So in reading through the disease related text, and extracting information, we find that there are few nomative values (discrete objects) worth picking up, so what would occur here is that the primary aspect of 'housing' would gain an attribute of 'disease' which would be one of many which would flavor this input into the overall 'housing' component of the economic entity. We also note that 'disease' is a highly negatively emotive word, and as an attribute, if it should arise within many such reads, might be elevated to a secondary aspect modifier of 'housing'. This would be true especially if the number of supporting words to 'disease' rose to include many other objects. In other words, if disease became a regular litany of disease types (ie. aids, smallpox, et al), then disease would be elevated to an aspect, an reprocessing would take place to determine the complete attribute set for disease such that it could be interpreted as a larger component of housing. Of course, it would be still negative. In this case, one might, at the end of such a run, have an interpretation resulting in something like

The Econ entity has an aspect of (disease) which arises in relation to another, cross-linked aspect, (housing) and that both have developing/future oriented time frames indicating that the housing sector is perceived to be suffering from a (disease) with [virulance] similar to [smallpox, whooping cough]. This is purely an example that might be rendered should the elements within the example cited here grow to a trend during such a processing run.

The rest of the processing for the emotive values for this conversational thread would reveal that, even leaving the disease connotations out of the aspect/attribute sets, we still find the whole conversation is negatively toned, and discretely populated by 'stated/bespoke' emotional fear & anxiety words. We find, as a quick and limited catalogue, by way of example..

inflicting pain

heartless

does not look very pretty

don't have a very good outlook

bought cheap

Less than

would have never let me buy

way too expensive

hate them

smaller

do not envy

having to grow up

during these times

one family income

tax time

didn't go

didn't have to put

didn't get

no savings

HUGE bills

money arguments

unexpected

crumbles

loses their job

all of which go toward the negative emotive aspect/attribute set for 'housing' within our 'economic' entity. This list was developed from the bottom up as that is the way that particular chunk of code operates, however, if one continues the read, it quickly becomes obvious that this read will contribute only negatives, and when processed against 'weighting' (balance of positive emotive words against negative emotive words) that it is consistently and overwhelmingly negative.

 

Other processing includes time calculations which for this example would discover that most of the coversations were of the 'immediate' variety in that the span is basically covering less than 3 days (March 21 through 23), but that the subject was compelling enough that over 3 weeks later, someone felt emotive connections strong enough to contribute. There is an inferance within this action that this subject is one of the primary human values or core motivators, which we note is indeed the case, as 'housing' is one of the top three. Also we can note that in this case the verbiage does not have a good connotation to how people feel about housing in general and in particular within this conversation. A sampling of the bespoke emotive words (highest possible values) has the following appearing:

scare

freaked

scares

fear

warn

fear

self destruction

envy

hate

pain

Obviously, this conversational 'read' will impart a negative weighing to the ultimate construction of the entities involved in the processing. Those words which are state descriptors of emotions are also obviously given the highest possible emotive value indicators, and again, these are mostly negative in this case. So here, 'housing' is not looking good.

But, the process is very much more complicated than shown within this example, but not much more complex, merely more of the same, hopefully, into the millions. Also, while we are examining text here, the processing is such that usually these forms of text examples are *NOT* present for the humans' providing interpretation.

Rather, the processing at this stage continues until a word is derived (in this case 'housing') which is either an entity, or a component of one. Then a series of 4 digit hexidecimal values are present along with the identifier. It might look as this:

fprd_109('housing', 23A9, 0002, 0000, [E456,E45A,E4CA,...], ......)

in this case, this would represent the finding of the word 'fear', a bespoke emotive value, exactly 2 times (0002) without (the 0000 part) any positive modifiers (such as 'not' or 'no' which would have turned the 'fear' into a positive affirmation in the sense of 'no fear'), and further that the 'fear' was cross linked to the aspects identfied within the sub list array (those numbers bound by the brackets and processed in a list fashion within prolog). In this particular instance, the array within brackets links back to the 'disease' and 'aids' aspect identifiers. The remaining elements within the tuple (bracked param of the predicate) have been deleted for clarity of this example.

In this processing run, some of the other programs would have made the links between the 'disease' aspects and the 'fear' attributes back to 'housing' within the 'economic' entity. Then these are ultimately presented along with our hoped for millions of other entries, as a series of points within a three dimensional model space and carrying values in the properties of color, size and shape.

These colored shapes are displayed within a 3-d grid and will 'naturally' cluster together forming, in a very loose sense, a 'scatter graph' view. Now note that they cluster due to the assignment within the final predicate heads, of the associations to entities, components, aspects, attributes, counts, intensitites, duration, impact, scope and other measures of emotive value relative to discrete objects. In other words, we tell all the 'housing' related shapes to gather around a certain point within our display. This allows for an entity to arise, if we are displaying millions of individual elements in which the example text for this article would appear 'totalled' as perhaps 4 to 20 points on the graph, which is what we refer to as 'serendipitiy' mode as the programs decide what is a key 'nomative' identifier. In the case of the text used here, 'housing' would have been the determination due to the many referencing and active words which are have been previously associated with 'housing' within the lexicon.

Alternatively, if the programs were unable to find an assignment within the lexicon, or came up with a word that appeared to be a noun not yet captured within the lexicon, then it gets kicked out of the processing for human review....which is a whole other article covering the several hundred thousand that get kicked out for one of many hundreds of reasons.....

If running in 'seek' mode, the graph will display links clustering around entities where all the elements are told to 'gather round' if they have a paricular word or phrase set within their data sets.

As can been seen, a whole lot of brain draining work for each processing. Especially as the language is changing rapidly, and is not limited to a single language. Such things as geographic identifiers have to be either discarded from processing or folded in if they represent a discrete nomative value emerging...such as might be the word 'Bagdad' which is now making the transistion from a geographic reference to an English languge, emotive carrier, and as such will have to be included within future processing as its emotive associations evolve. This, of course, is put into our lexicon by one of the many, many maintenance programs.

We also note that the evolution of language is not limited to merely new geographic references, but frequently we import foreign words with or without changing the meaning into English. Then we stamp them with a beginning emotive context which will 'settle down' over time as we all agree that 'eyup, it do mean that'.

The above is a basic primer on how the bot processing works. There are acres of more detail which would consume vast quantities of time so we will put that aside. As to the interpretation of the graph, well, while that is an article of its own for the future, sufice it to say, that one looks at the various clusters of colored dots and squiggles and other shapes, and then through clicking down, derives a number referenece which, when given to the prolog interpreter with the correct predicate calls, will drill back down to where that numeric value derived, including which word, which modifying words, which linked aspects/entities/attributes, and complete aspect attribute sets of origination. However, unlike the example above which was snatched from processing very early, the human would not find out where or when or who wrote they items that were processed into the words making up the entry on the chart. At most they would be able to see the aspect/attribute sets. These are the lists of words ranked by emotive values which in this case would be similar to some of the lists extracted and isolated above. It is then, within the context of the other, potentially millions of dots within the entity, up to the human to select representative value words to use in the interpretation.

Clear as mud?