Invited Speakers (Dec 7)

1

  • Shanghua Teng (University of Southern California)
    Proper Learnability and the Role of Unlabeled Data and Regularization

    Abstract: Proper learning refers to the setting in which learners must emit predictors in the underlying hypothesis class H, and often leads to learners with simple algorithmic forms (e.g., empirical risk minimization (ERM), regularization based structural risk minimization (SRM)). The limitation of proper learning, however, is that there exist problems which can only be learned improperly, e.g. in multiclass classification. Thus, we ask: Under what assumptions on the hypothesis class or the information provided to the learner is a problem properly learnable? We demonstrate that when the unlabeled data distribution is given, there always exists an optimal proper learner governed by distributional regularization, a randomized generalization of regularization. We refer to this setting as the distribution-fixed PAC model, and continue to evaluate the learner on its worst case performance over all distributions.

    Furthermore, we precisely characterize the role of regularization in perhaps the simplest setting for which ERM and SRM fails: multiclass learning with arbitrary label sets. Using one-inclusion graphs (OIGs), we exhibit optimal learning algorithms that dovetail with tried-and-true algorithmic principles: Occam’s Razor as embodied by structural risk minimization (SRM), the principle of maximum entropy, and Bayesian inference. We extract from OIGs a combinatorial sequence we term the Hall complexity, which is the first to characterize a problem’s transductive error rate exactly; we also introduce a generalization of OIGs and the transductive learning setting to the agnostic case. We demonstrate that an agnostic version of the Hall complexity again characterizes error rates exactly, and exhibit an optimal learner using maximum entropy programs. This work provides new insights into the role of unlabeled data, and how it can empower local regularization in machine learning.

    Joint work (COLT 2024, ALT 2025) with Julian Asilis, Siddartha Devic, Shaddin Dughmi, and Vatsal Sharan
2

  • Yossi Azar (Tel-Aviv University)
    Competitive Bundle Trading

    Abstract: Allocating or selling a set of resources to a sequence of arriving customers has been extensively studied under a wide range of models and objectives. However, a natural extension arises when the algorithm is also allowed to purchase additional resources from suppliers. This setting arguably captures the fundamental scenario faced by any trader or retailer. A natural objective in this context is to maximize the cumulative profit over the online sequence. That is, the difference between revenue from sales and expenses from purchases. This problem of maximizing the difference between two competing quantities appears to be significantly more challenging than the sell-only case, which may explain why it has remained largely unexplored.

    We initiate the study of such a trading problem and design an algorithm that achieves a logarithmic competitive ratio relative to the optimal offline solution. Our approach employs an exponential-weight–update dynamic pricing scheme, and our analysis uses a dual-fitting argument that relates the algorithm’s profit to a linear programming relaxation that upper bounds the optimal offline profit. We also prove (nearly) matching lower bounds. Finally, we extend our results by designing an incentive-compatible mechanism for the setting in which customers are strategic and may misreport their true valuations.

    Joint work with Niv Buchbinder, Roie Levin and Or Vardi
3

Special Event

  • Chair: Danny Chen (University of Notre Dame)

Keynote Speakers (Dec 8-10)

4

  • Piotr Indyk (Massachusetts Institute of Technology)
    Challenges and Opportunities of Graph-Based Algorithms for Similarity Search

    Abstract: Over the last few years, graph-based approaches to nearest neighbor search have attracted renewed interest. Algorithms such as HNSW, NSG, and DiskANN have become popular tools in practice. These algorithms are highly versatile and come with efficient implementations. At the same time, their correctness, performance guarantees, and functionality remain poorly understood. In this talk, I will discuss the challenges and opportunities presented by this class of algorithms.
5

  • Mikkel Thorup (University of Copenhagen)
    Hash Functions Bridging the Gap From Theory to Practice

    Abstract: Randomized algorithms are often enjoyed for their simplicity, but the hash functions employed to yield the desired probabilistic guarantees are often too complicated to be practical. Hash functions are used everywhere in computing, e.g., hash tables, sketching, dimensionality reduction, sampling, and estimation. Many of these applications are relevant to Machine Learning, where we are often interested in similarity between high dimensional objects. Reducing the dimensionality is key to efficient processing. Abstractly, we like to think of hashing as fully-random hashing, assigning independent hash values to every possible key, but essentially this requires us to store the hash values for all keys, which is unrealistic for most key universes, e.g., 64-bit keys. In practice we have to settle for implementable hash functions, and often practitioners settle for implementations that are too simple in that the algorithms end up working only for sufficiently random input. However, the real world is full of structured/non-random input. The issue is severe, for simplistic hash functions will often work very well in tests with random input. Moreover, the issue is often that error events that should never happen in practice, happen with way too high probability. This does not show in a few tests, but will show up over time when you put the system into production. Over the last decade there has been major developments in simple to implement tabulation based hash functions offering strong theoretical guarantees, so as to support fundamental properties such as Chernoff bounds, Sparse Johnson-Lindenstrauss transforms, and fully-random hashing on a given set w.h.p. etc. I will discuss some of the principles of these developments and offer insights on how far we can bridge from theory (assuming fully-random hash functions) to practice (needing something that can actually implemented efficiently).