Usually, supermarkets and online stores have a large offer of products. This large offer gives the consumer a wide range of choice and at the same time attract different consumer segments. On the other hand, it can make harder the task of the consumer in finding all the products he wants to buy. Cooking recipes are a source of information that can be used to ease the purchase of products. The number of shared cooking recipes through theWorldWideWeb has been increasing. They are not only shared by professional cooking chefs, but also by “home” chefs. These cooking recipes are often written in a text-free form, which raises problems to the computer process the information. The aim of this thesis is to develop a system that transforms a recipe written in Portuguese into a shopping list. With the information extracted from the recipe match the ingredients and quantities with a real-world database and then return that information to the consumer.
This problem has been divided into two steps: the first consists in extracting relevant information from the recipe, the second is to match the necessary ingredients to prepare the recipe with products from a real retailer database. Concerning the extraction of the ingredients from the unstructured text, the goal is to extract the name of the ingredient and the respective quantity. This is done by applying a Conditional Random Fields model. The next step on the system is to match the ingredients obtained with products presented in a database. There are some challenges in this phase. The match between the extracted information from the recipe and the database, sometimes may not produce the expected results. That’s because of some noise, the use of synonyms or even misspell words. To pass through this challenge, it is necessary to use external resources, such as ontologies, thesaurus or taxonomies.
To evaluate the developed components, in addition to a previously available dataset, two more were annotated. The information extraction got an F1 of 0.98 in one of the experiments made. The similarity match got an average F1 of 0.354 and an average precision of 0.364 in one of the experiments with a path based measure.
As previously mentioned, it is believed that this system will ease the customers work in creating a shopping list from a recipe.
Author: Marcelo Ferreira
Type: MSc thesis
Partners: Faculdade de Engenharia da Universidade do Porto; Sonae Modelo Continente SA