Learning to classify data points with a halfspace is one of the foundational problems in machine learning. It is known that learning optimal halfspaces is statistically possible even when we are agnostic to the generative model of data, but matching this guarantee is computationally hard in general. We introduce a model of misspecification for generalized linear models where the misspecification is benign/semirandom (the noise level can only be decreased) and give new algorithms which achieve close to optimal guarantees in this model, even though standard heuristics fail quite badly. In particular, our results resolve a number of long-standing open questions for a special case, learning halfspaces with Massart noise.