The internet is a big place and most people’s interaction with it is regulated by a few companies paid to sell you things. My team has been building tools for the DARPA Memex project to democratize search for all, with tools that go beyond the surface web and pull out rich structured data to analyze.
In this presentation, we dive into using our Python based open source tool stack for finding information and utilizing the rest of the Python ecosystem for analysis. With an interface to crawling, extraction, topic modeling,search indexing, and image analysis.
Andrew Terrell is a computational scientist with experience implementing distributed, large data applications. Currently serveing as the Chief Science Officer at Continuum Analytics, he leads the Blaze team taking the Python data stack to the next generation of scalable tools. Andy also works with numerous data scientists in academia and industry helping architect systems for interacting with large data resources. In his research, he is known for creating novel algorithms to speed implementations of mathematical models on the world’s largest supercomputers.
Andy received his Computer Science PhD at the University of Chicago in 2010. He has held research positions at Argonne National Lab, Sandia National Lab, Institute of Computational Engineering and Sciences at The University of Texas-Austin, and the Texas Advanced Computing Center. In industry, Andy served as lead developer at Kove, Inc. during its early stages, where he helped bring a record breaking SAN disk array to market. Andy also has also worked with big data financial applications during a brief tenure with Enthought, Inc.
Andy is a passionate advocate for open source scientific codes. To this end, he is a board member of the NumFOCUS foundation and has been involved in the wider scientific Python community since 2006. Andy has contributed to numerous projects in the scientific stack and hopes push for data to become a first class object for scientists worldwide.”