Likely they would have seen differences if they had compared on long tail queries instead of common queries. With the growth in search usage at the time the diversity of searches was only going up.
"they tested was “Internet.” According to Hassan, Excite’s first results were Chinese web pages where the English word “Internet” stood out among a jumble of Chinese characters. Then the team typed “Internet” into BackRub. The first two results delivered pages that told you how to use browsers. It was exactly the kind of helpful result that would most likely satisfy someone who made the query. Bell was visibly upset. The Stanford product was too good. If Excite were to host a search engine that instantly gave people information they sought, he explained, the users would leave the site instantly. Since his ad revenue came from people staying on the site—“stickiness” was the most desired metric in websites at the time—using BackRub’s technology would be counterproductive."