Automotive Cells Company

Dataquitaine 2024 - SESSION 3.3 - Amphi 3 - 21/03/2024 15h30 > 16h00

Patent Analysis with Large Language Models

Résumé

Introduction

The field of EV (Electrical Vehicle) battery manufacturing is a very active and competitive area of research which is projected to develop substantially in the coming years. In particular, technology watch and intellectual property (IP) are critical topics for EV battery manufacturers as they must protect their inventions while constantly screening for competing patents as early as possible in their R&D process. However, while critical, this task also is very time consuming and error-prone as it often involves manually querying and reviewing large patent databases.

A possible way of helping IP engineers and domain experts in their technology watch is to leverage (Large) Language Models to automatically retrieve, gather and aggregate relevant information from patents. By doing so, it is possible to provide them with an automated “digest” which can speed up the screening process. This work presents a Proof of Concept (POC) for this strategy focusing on EV battery manufacturing patents.

Méthodologie

The initial step consists in translating all patents at hand in the same language. As English is the most common language in patent offices, we chose to translate to English all patents written in a non-English language: most often Japanese, Chinese, Koreana and French. Building the digest consisted in three steps: (1) clustering patent abstracts on a semantic basis and, for each cluster, (2) extracting a list of technical keywords defining the cluster and (3) generating a short summary describing that cluster using generative models.

We chose to cluster patent abstracts (1) using pre-trained and fine-tuned transformer backbones. We carried out a benchmark of their clustering performance both quantitatively and qualitatively in order to identify the best one. As for the keyword extraction part, we compared various approaches, from conventional approaches based on term and document frequency, to word embedding-based methods and pure LLM prompt engineering.

Finally, we used LLMs such as Llama-2 through hugging-face combined to langchain in order to generate summaries of each cluster. Eventually, in order to assist IP engineers and domain experts when exploring large patent corpora, all those elements are displayed in the form of a user friendly app. The LLM has been deployed in the form of a containerized API using HuggingFace’s text-generation-inference tool.

Originalité / perspective

The originality of this work resides in the fact that this project covered all the steps of an artificial intelligence POC from framing to modeling, performance evaluation and MVP deployment. Our talk will be the opportunity to share takeaways on the challenges we faced during these steps, especially hardware/infrastructure selection and deployment. For future work, we expect to collect more data and carry out more thorough benchmarks of each step of the pipeline. Moreover, we plan on redesigning and redeploying this product on a wider scale at Automotive Cells Company.

Revoir la vidéo :

https://www.acc-emotion.com/fr

A propos des orateurs

Mehdi ELION

Data Scientist

Ingénieur diplômé de l'École Centrale de Lyon ainsi que d'un master Data Science de l'Université de Lyon, Mehdi Elion a débuté sa carrière en tant que Data Scientist R&D dans une startup qui traite des cas d'application de l'intelligence artificielle à l'industrie, notamment sur des sujets de computer vision. Il poursuit aujourd'hui son parcours de Data Scientist au sein d'ACC où il traite divers cas d'usage impliquant notamment deep learning et NLP.

https://www.acc-emotion.com/fr

Nicolas PREVOT

Data Scientist

Depuis l'obtention de son Master en Mathématiques Appliquées à l'INSA Toulouse en 2022, Nicolas Prevot a rejoint ACC en tant que Data Scientist au cœur de l'équipe Data Science. Il est fier de faire partie d'une équipe dynamique qui participe à un large éventail de projets, notamment dans les domaines de la computer vision et du NLP

https://www.acc-emotion.com/fr

S'inscrire !

Ticket

Organisation

Avec le soutien de

Cet événement a bénéficié d'une aide de l’État gérée par l'Agence Nationale de la Recherche au titre du Plan France 2030, portant la référence ANR-21-EXES-0004

Avec la participation de

Partenaire OR & ARGENT

Partenaire PLATINE