Tobias Macey presents
The Machine Learning Podcast.__init__
The podcast the constantly changing world of machine learning and data science
October 16, 2021
An Exploration Of Financial Exchange Risk Management Strategies
The world of finance has driven the development of many sophisticated techniques for data analysis. In this episode Paul Stafford shares his experiences working in the realm of risk management for financial exchanges. He discusses the types of risk that are involved, the statistical methods that he has found most useful for identifying strategies to mitigate that risk, and the software libraries that have helped him most in his work.
October 9, 2021
Build Better Machine Learning Models By Understanding Their Decisions With SHAP
Machine learning and deep learning techniques are powerful tools for a large and growing number of applications. Unfortunately, it is difficult or impossible to understand the reasons for the answers that they give to the questions they are asked. In order to help shine some light on what information is being used to provide the outputs to your machine learning models Scott Lundberg created the SHAP project. In this episode he explains how it can be used to provide insight into which features are most impactful when generating an output, and how that insight can be applied to make more useful and informed design choices. This is a fascinating and important subject and this episode is an excellent exploration of how to start addressing the challenge of explainability.
September 30, 2021
Accelerating Drug Discovery Using Machine Learning With TorchDrug
Finding new and effective treatments for disease is a complex and time consuming endeavor, requiring a high degree of domain knowledge and specialized equipment. Combining his expertise in machine learning and graph algorithms with is interest in drug discovery Jian Tang created the TorchDrug project to help reduce the amount of time needed to find new candidate molecules for testing. In this episode he explains how the project is being used by machine learning researchers and biochemists to collaborate on finding effective treatments for real-world diseases.
September 26, 2021
An Exploration Of Automated Speech Recognition
The overwhelming growth of smartphones, smart speakers, and spoken word content has corresponded with increasingly sophisticated machine learning models for recognizing speech content in audio data. Dylan Fox founded Assembly to provide access to the most advanced automated speech recognition models for developers to incorporate into their own products. In this episode he gives an overview of the current state of the art for automated speech recognition, the varying requirements for accuracy and speed of models depending on the context in which they are used, and what is required to build a special purpose model for your own ASR applications.
September 19, 2021
Experimenting With Reinforcement Learning Using MushroomRL
Reinforcement learning is a branch of machine learning and AI that has a lot of promise for applications that need to evolve with changes to their inputs. To support the research happening in the field, including applications for robotics, Carlo D'Eramo and Davide Tateo created MushroomRL. In this episode they share how they have designed the project to be easy to work with, so that students can use it in their study, as well as extensible so that it can be used by businesses and industry professionals. They also discuss the strengths of reinforcement learning, how to design problems that can leverage its capabilities, and how to get started with MushroomRL for your own work.
September 10, 2021
Doing Dask Powered Data Science In The Saturn Cloud
A perennial problem of doing data science is that it works great on your laptop, until it doesn't. Another problem is being able to recreate your environment to collaborate on a problem with colleagues. Saturn Cloud aims to help with both of those problems by providing an easy to use platform for creating reproducible environments that you can use to build data science workflows and scale them easily with a managed Dask service. In this episode Julia Signall, head of open source at Saturn Cloud, explains how she is working with the product team and PyData community to reduce the points of friction that data scientists encounter as they are getting their work done.
September 3, 2021
Monitor The Health Of Your Machine Learning Products In Production With Evidently
You've got a machine learning model trained and running in production, but that's only half of the battle. Are you certain that it is still serving the predictions that you tested? Are the inputs within the range of tolerance that you designed? Monitoring machine learning products is an essential step of the story so that you know when it needs to be retrained against new data, or parameters need to be adjusted. In this episode Emeli Dral shares the work that she and her team at Evidently are doing to build an open source system for tracking and alerting on the health of your ML products in production. She discusses the ways that model drift can occur, the types of metrics that you need to track, and what to do when the health of your system is suffering. This is an important and complex aspect of the machine learning lifecycle, so give it a listen and then try out Evidently for your own projects.
August 25, 2021
Making Automated Machine Learning More Accessible With EvalML
Building a machine learning model is a process that requires a lot of iteration and trial and error. For certain classes of problem a large portion of the searching and tuning can be automated. This allows data scientists to focus their time on more complex or valuable projects, as well as opening the door for non-specialists to experiment with machine learning. Frustrated with some of the awkward or difficult to use tools for AutoML, Angela Lin and Jeremy Shih helped to create the EvalML framework. In this episode they share the use cases for automated machine learning, how they have designed the EvalML project to be approachable, and how you can use it for building and training your own models.
August 19, 2021
Growing And Supporting The Data Science Community At Anaconda
Data scientists are tasked with answering challenging questions using data that is often messy and incomplete. Anaconda is on a mission to make the lives of data professionals more manageable through creation and maintenance of high quality libraries and frameworks, the distribution of an easy to use Python distribution and package ecosystem, and high quality training material. In this episode Kevin Goldsmith, CTO of Anaconda, discusses the technical and social challenges faced by data scientists, the ways that the Python ecosystem has evolved to help address those difficulties, and how Anaconda is engaging with the community to provide high quality tools and education for this constantly changing practice.
August 15, 2021
Network Analysis At The Speed Of C With The Power Of Python Using NetworKit
Analysing networks is a growing area of research in academia and industry. In order to be able to answer questions about large or complex relationships it is necessary to have fast and efficient algorithms that can process the data quickly. In this episode Eugenio Angriman discusses his contributions to the NetworKit library to provide an accessible interface for these algorithms. He shares how he is using NetworKit for his own research, the challenges of working with large and complex networks, and the kinds of questions that can be answered with data that fits on your laptop.
August 4, 2021
Delivering Deep Learning Powered Speech Recognition As A Service For Developers At AssemblyAI
Building a software-as-a-service (SaaS) business is a fairly well understood pattern at this point. When the core of the service is a set of machine learning products it introduces a whole new set of challenges. In this episode Dylan Fox shares his experience building Assembly AI as a reliable and affordable option for automatic speech recognition that caters to a developer audience. He discusses the machine learning development and deployment processes that his team relies on, the scalability and performance considerations that deep learning models introduce, and the user experience design that goes into building for a developer audience. This is a fascinating conversation about a unique cross-section of considerations and how Dylan and his team are building an impressive and useful service.
July 28, 2021
Taking Aim At The Legacy Of SQL With The Preql Relational Language
SQL has gone through many cycles of popularity and disfavor. Despite its longevity it is objectively challenging to work with in a collaborative and composable manner. In order to address these shortcomings and build a new interface for your database oriented workloads Erez Shinan created Preql. It is based on the same relational algebra that inspired SQL, but brings in more robust computer science principles to make it more manageable as you scale in complexity. In this episode he shares his motivation for creating the Preql project, how he has used Python to develop a new language for interacting with database engines, and the challenges of taking on the legacy of SQL as an individual.
July 22, 2021
Unleash The Power Of Dataframes At Any Scale With Modin
When you start working on a data project there are always a variety of unknown factors that you have to explore. One of those is the volume of total data that you will eventually need to handle, and the speed and scale at which it will need to be processed. If you optimize for scale too early then it adds a high barrier to entry due to the complexities of distributed systems, but if you invest in a lot of engineering up front then it can be challenging to refactor for scale. Modin is a project that aims to remove that decision by letting you seamlessly replace your existing Pandas code and scale across CPU cores or across a cluster of machines. In this episode Devin Petersohn explains why he started working on solving this problem, how Modin is architected to allow for a smooth escalation from small to large volumes of data and compute, and how you can start using it today to accelerate your Pandas workflows.
July 14, 2021
Exploring The SpeechBrain Toolkit For Speech Processing
With the rising availability of computation in everyday devices, there has been a corresponding increase in the appetite for voice as the primary interface. To accomodate this desire it is necessary for us to have high quality libraries for being able to process and generate audio data that can make sense of human speech. To facilitate research and industry applications for speech data Mirco Ravanelli and Peter Plantinga are building SpeechBrain. In this episode they explain how it works under the hood, the projects that they are using it for, and how you can get started with it today.
July 7, 2021
Fast And Educational Exploration And Analysis Of Graph Data Structures With graph-tool
If you are interested in a library for working with graph structures that will also help you learn more about the research and theory behind the algorithms then look no further than graph-tool. In this episode Tiago Peixoto shares his work on graph algorithms and networked data and how he has built graph-tool to help in that research. He explains how it is implemented, how it evolved from a simple command line tool to a full-fledged library, and the benefits that he has found from building a personal project in the open.
June 30, 2021
Lightening The Load For Deep Learning With Sparse Networks Using Neural Magic
Deep learning has largely taken over the research and applications of artificial intelligence, with some truly impressive results. The challenge that it presents is that for reasonable speed and performance it requires specialized hardware, generally in the form of a dedicated GPU (Graphics Processing Unit). This raises the cost of the infrastructure, adds deployment complexity, and drastically increases the energy requirements for training and serving of models. To address these challenges Nir Shavit combined his experiences in multi-core computing and brain science to co-found Neural Magic where he is leading the efforts to build a set of tools that prune dense neural networks to allow them to execute on commodity CPU hardware. In this episode he explains how sparsification of deep learning models works, the potential that it unlocks for making machine learning and specialized AI more accessible, and how you can start using it today.
June 23, 2021
Finding The Core Of Python For A Bright Future With Brett Cannon
Brett Cannon has been a long-time contributor to the Python language and community in many ways. In this episode he shares some of his work and thoughts on modernizing the ecosystem around the language. This includes standards for packaging, discovering the true core of the language, and how to make it possible to target mobile and web platforms.
June 16, 2021
Traversing The Challenges And Promise Of Graph Machine Learning
The foundation of every ML model is the data that it is trained on. In many cases you will be working with tabular or unstructured information, but there is a growing trend toward networked, or graph data sets. Benedek Rozemberczki has focused his research and career around graph machine learning applications. In this episode he discusses the common sources of networked data, the challenges of working with graph data in machine learning projects, and describes the libraries that he has created to help him in his work. If you are dealing with connected data then this interview will provide a wealth of context and resources to improve your projects.
June 9, 2021
Keep Your Analytics Lint Free With SQLFluff
The growth of analytics has accelerated the use of SQL as a first class language. It has also grown the amount of collaboration involved in writing and maintaining SQL queries. With collaboration comes the inevitable variation in how queries are written, both structurally and stylistically which can lead to a significant amount of wasted time and energy during code review and employee onboarding. Alan Cruickshank was feeling the pain of this wasted effort first-hand which led him down the path of creating SQLFluff as a linter and formatter to enforce consistency and find bugs in the SQL code that he and his team were working with. In this episode he shares the story of how SQLFluff evolved from a simple hackathon project to an open source linter that is used across a range of companies and fosters a growing community of users and contributors. He explains how it has grown to support multiple dialects of SQL, as well as integrating with projects like DBT to handle templated queries. This is a great conversation about the long detours that are sometimes necessary to reach your original destination and the powerful impact that good tooling can have on team productivity.
June 2, 2021
Exploring The Patterns And Practices For Deep Learning With Andrew Ferlitsch
Deep learning is gaining an immense amount of popularity due to the incredible results that it is able to offer with comparatively little effort. Because of this there are a number of engineers who are trying their hand at building machine learning models with the wealth of frameworks that are available. Andrew Ferlitsch wrote a book to capture the useful patterns and best practices for building models with deep learning to make it more approachable for newcomers ot the field. In this episode he shares his deep expertise and extensive experience in building and teaching machine learning across many companies and industries. This is an entertaining and educational conversation about how to build maintainable models across a variety of applications.
© 2021 Tobias Macey