Text-based passwords are a fundamental and popular means of authentication. Password authentication is simple to implement because it does not require any equipment,
unlike biometric authentication, and it relies only on the user’s memory. Therefore...
Text-based passwords are a fundamental and popular means of authentication. Password authentication is simple to implement because it does not require any equipment,
unlike biometric authentication, and it relies only on the user’s memory. Therefore, people
often use easy-to-remember passwords, such as ”iloveyou1234.” This reliance on memory,
however, is an inherent weakness of passwords, mainly because these easy-to-remember
passwords can also be cracked easily. Despite this well-known weakness, passwords are
still the de-facto authentication method for most online systems. Owing to this importance, password cracking has been researched extensively, both for offensive and defensive
purposes. Hashcat and John the Ripper are the most popular cracking tools, allowing
users to crack millions of passwords in a short time, based on password- cracking dictionaries and rule-sets. However, rule-based cracking has an explicit limitation of depending
on password-cracking experts to come up with creative rules. To overcome this limitation, a recent trend has been to apply machine learning techniques to conduct research
on password cracking. For instance, state-of-the-art password guessing studies such as
PassGAN adopted a Generative Adversarial Network (GAN) and used it to generate highquality password guesses without knowledge of password structures. However, compared
to the probabilistic context-free grammar (PCFG), PassGAN showed inferior passwordcracking performance in all experimental cases. In addition, PassGAN could not prove
its cracking performance under practical cases (long-length and complicated passwords).
In this thesis, I propose new methods for achieving improved password-cracking performance, which are based on both the generator and discriminator modules of a GAN. With
respect to the generator of GAN, I describe new techniques for improving the passwordcracking performance of PassGAN. Interestingly, changing both basic neural networks
and the hyper-parameter configuration of GANs outperforms the cracking performance of
PassGAN. In addition, transforming to dual-discriminator architecture has a beneficial
effect on improving the password-cracking performance. These new approaches are denoted as rPassGAN, rPassD2CGAN, and rPassD2SGAN. In some experimental cases, the
rPassGAN series surpasses PCFG as well.
Through several experiments with rPassGAN, I observed that each password guessing
model has its own cracking space that does not overlap with other models. This observation led me to realize that an optimized candidate dictionary can be made by combining
the password candidates generated by multiple password generation models. The second
technique I suggest is a deep learning-based approach called REDPACK that addresses
the weakness of the cutting-edge GAN-based password-cracking tools. To this end, REDPACK combines multiple password generator models in an effective way. This approach
uses the discriminator of the rPassGAN as the password-candidate selector. Then, by
collecting passwords selectively, REDPACK achieves a more realistic password candidate
dictionary. Also, REDPACK improves password cracking performance by incorporating
both the generator and the discriminator in a GAN framework. I evaluated this model
on various datasets with password candidates composed of symbols, digits, upper, and
lowercase letters. The results clearly show that my approach outperforms all existing
approaches, including rule-based Hashcat, GAN-based PassGAN, and probability-based
PCFG. Another advantage of the proposed model is that REDPACK can reduce the number of password candidates by up to one-third or one-fourth, with small cracking performance loss compared to the union set of passwords cracked by multiple-generation models.
Finally, I propose iREDPACK, which is the first heterogeneously-structured GAN model
in the password-cracking domain and adopts the concept of Google Inception. iREDPACK
is designed for handling passphrase-structured passwords. iREDPACK selects more password candidates of PCFG than REDPACK in all experiments.