Find Reply Module

Architecture Overview

Attributes

Aggregate content from different sources
Most of its time spent in waiting a reply from those sources
Must use mutlithreading to serve RPC requests from RASA to optimize performances

Docker Configuration

ENVIRONMENT VARIABLES

Mandatory
- RMQ_IP: RabbitMQ broker FQN or IP
- QUESTIONS_QUEUE: Questions from RASA (Ex: tbott-rpc_.dom.proj_)
Optional
- NB_INSTANCES: number of allocated threads
  ► limits the number of questions in parallel
  ► defines service level capacities (dimensioning rule)
  ► default = 5
- RMQ_USER: ...
- RMQ_PASSWD: ...
- ENCODER_QUEUE: tbott-encoder (Bert encoder queue)
- MATCHING_QUEUE: tbott-matching (Sentence matching queue)
- RMQ_PORT: 5672
- LOG_LEVEL: INFO

VOLUMES

logs: mounted in /var/log/supervisor

Communications

most of the code can be picked from the previous version of Maestro

Two type of requests

To be compatible with the existing

"question" : questions coming from RASA
"analyse_sentence" : From Analyser
- can be implemented in a second or third stage ~ not mandatory ~
- reuse codes from the previous Maestro version
- needs to generate a list of questions with one and two words missings
- number of free threads must be set to 0, or heavily decreased, to save resources

code trails

Line _51_ of ai_worker.py:

self.worker = Worker(ip, user, passwd, channel, self.handle_message)

Line _57_ of ai_worker.py:

def handle_message(self, msg):
  """
    Callback to handle messages
  """
  LOGGER.debug('Received: %s', msg)
  try:
    request = json.loads(msg)
    if request['type'] == "question":
      full_answer = self.find_reply(request)
      return json.dumps(full_answer)
    elif request['type'] == "analyse_sentence":
      full_answer = self.analyse_sentence(request['content'], request['categories'], request['reference'])
      return json.dumps(full_answer)

...

Step 1: contexts

NB: only if message is a "question" (ie comes from RASA)

get all possible categories (or contexts) from Admin.
- request:
```
 { "request": "get_contexts", , "tag": "prev_tag", "content": "cat1"}
```
  - tag: the tag given in the previous request's result.
    Set to None if it doesn't exist.
  - content: optional. It can be used to filter categories if one is provided in the question. You can omit it if not needed.
- reply:
```
 { 
 "request": "get_contexts",
 "result":  {
     "tag": "update_tag",
     "dictionary": ["word_a: word_b", "word_x: word_y"],
     "contexts": [{"cat1": [{"cat1": "DDAY"}]},  
                          {"cat1_cat2": [{"cat1": "D-DAY", "cat2": "Bilan"},  
                                         {"cat1": "D-DAY", "cat2": "Cimetières"}]}
                         ]
 }
```
  - tag: a tag to track updates. If the tag in the request in equal to the last update tag then dictionary and contexts wil be empty and previous dictionary and contexts should be used.
  - dictionary: a dictionary to pass to wordtools
  - contexts: list of all contexts

extend question list with all possible contexts

The list of questions must be preprocessed with the library wordtools 4.0 or higher:

 from wordtools.text_processor import Document
 ...
 doc = Document('', dictionary=dictionary)
 processed = []
 for q in questions:
   doc.text = q
   doc.clean()
   doc.apply_dictionary()
   if auto_correct:
     doc.auto_correct()
   processed.append(doc.text)

Step 2: vectors

Send question list to rabbitMQ $ENCODER_QUEUE using the Messaging library.

Request example:

{ 
  "request": "encode",
  "content": ["original_question", "extended_question1", "..."]
}

Reply example:

{ 
  "result": [numpy.array, numpy.array, numpy.array]
}

Step 3 : candidates

Get possible candidates from FAISS module (127.0.0.1:14000)

Search request example:

{ "requests" : ["search"],
  "search" : [numpy.array , numpy.array]
}

FAISS reply example:

{ "result" : numpy.array }

The FAISS module send back a matrix of size 2 x nb_questions x nb_results :

2 : IDs and scores (in this order)
nb_questions: number of questions sent to FAISS
nb_results: number of results per question (top k results)

If the question comes from RASA (ie message type = "question"), then all the questions are derivated from the original question.
Which means that the results must be consolidated into only one list of IDs.
This shouldbe done via Numpy as python's loops are quite time consuming.

Code snippet:

res = search_faiss(q_vectors)
I = res[0]
S = res[1]
NQ = I.shape[0]              # Number of questions
K = I.shape[1]               # Number of results per question
I_ = np.reshape(I, NQ * K)   # reshape matrice into a list
S_ = np.reshape(S, NQ * K)   # idem
res_ids = np.argsort(S_)     # indexes of the sorted scores
#first occurence of each result
_, i = np.unique(I_[res_ids], return_index=True)
I_ = I_[i]                   # keep uniq ids
S_ = S_[i]                   # Keep uniq scores
final_idx = np.argsort(S_)   # Indexes of the sorted scores
final_idx = final_idx[:K]     # Keep only the K first results
final_ids = I_[final_idx]
final_scores = S_[final_idx]
return final_ids, final_scores

Step 4 : Questions to match

Get from Admin questions and replies associated to the list of candidates IDs.

request:

 { "request": "get_candidates",
   "content": [1, 2, 3]}

reply:

 { 
 "request": "get_candidates",
 "result":  [{"id": 1, "ref": "DDAY/dday/20", 
              "question": "Quelles sont les plages du débarquement ?", 
              "augmented": false, 
              "reply": "Les 5 plages du débarquement sont ...", 
              "cat1": "D-DAY", "cat2": "Lieux de Batailles"}, 
             {"id": 2, "ref": "DDAY/dday/20", 
              "question": "D-DAY, Quelles sont les plages du débarquement ?", 
              "augmented": true, 
              "reply": "Les 5 plages du débarquement sont ...", 
              "cat1": "D-DAY", "cat2": "Lieux de Batailles"},
             {"id": 3, "ref": "DDAY/dday/20", 
              "question": "D-DAY, Lieux de Batailles, Quelles sont les plages du débarquement ?", 
              "augmented": true, 
              "reply": "Les 5 plages du débarquement sont ...", "cat1": "D-DAY", "cat2": "Lieux de Batailles"}
             ]}

Build a list of paired questions.
- Augmented questions must be paired only with questions from the same categories/contexts

Step 5 : Best match

Starting with LaBSE encoder, cosine similartiry measures (provided by FAISS module) can be directly used as a matching score if:

score is >= 0.90 for a question with more than 6 words
score is >= 0.75 for a question with less than 7 words

This shortcut will improve latencies without sacrifying accuracy.

Send the list of questions pairs to rabbitMQ $MATCHING_QUEUE using the Messaging library.
Build the final reply with the best match and other candidates
- chuncks of previous version of Maestro can be reused to speed up devs

Testing procedure

"backdoor" to test question directly from admin consol
Uses Messaging lib
Internal communication only (Admin)