The 2023 Dataset



The dataset is a rich compilation of recipes, spanning a wide range of cuisines and styles. It offers a unique perspective on what makes a recipe more than just a list of ingredients and steps. With over 500k recipes, it’s a deep dive into the culinary world, providing data enthusiasts, chefs, and food bloggers an opportunity to analyze and understand cooking trends on a macro scale.

BERT extracts key entities in raw text, and a model can be trained to recognize food and ingredient nouns.

One of the most exciting parts of preparing this dataset was using BERT (Bidirectional Encoder Representations from Transformers) to extract ingredient names from the raw scraped data. BERT, a state-of-the-art language processing AI, helped us discern and categorize ingredients from diverse recipe descriptions. This not only improved the accuracy of our ingredients list but also provided fascinating insights into how different ingredients are used across various cuisines and recipes.



For instance, certain ingredients appear more frequently indicating their universal appeal. This is not to mention the many trends in meal types, cooking methods, and dietary preferences, if to only glimpse into the most popular and unique aspects of cooking!

This dataset for anyone interested in food, cooking, and data analysis. Whether you’re a data scientist looking to apply machine learning for recipe recommendation systems, a chef exploring new recipe ideas, or a food enthusiast curious about culinary trends, this dataset has something for everyone. Stay tuned and join in on sharing more insights from this fascinating culinary dataset. Happy data cooking!


By Alexander Wei

BA, MS Mathematics, Tufts University

