
Peipei Wang- BS,MS
- North Carolina State University
Peipei Wang
- BS,MS
- North Carolina State University
About
15
Publications
7,622
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
312
Citations
Introduction
Current institution
Publications
Publications (15)
Regular expressions cause string-related bugs and open security vulnerabilities for DOS attacks. However, beyond ReDoS (Regular expression Denial of Service), little is known about the extent to which regular expression issues affect software development and how these issues are addressed in practice. We conduct an empirical study of 356 regex-rela...
Regular expressions cause string-related bugs and open security vulnerabilities for DOS attacks. However, beyond ReDoS (Regular expression Denial of Service), little is known about the extent to which regular expression issues affect software development and how these issues are addressed in practice. We conduct an empirical study of 356 merged reg...
Developers report testing their regular expressions less than the rest of their code. In this work, we explore how thoroughly tested regular expressions are by examining open source projects.
Using standard metrics of coverage, such as line and branch coverage, gives an incomplete picture of the test coverage of regular expressions. We adopt graph-...
Cloud server systems such as Hadoop and Cassandra have enabled many real-world data-intensive applications running inside computing clouds. However, those systems present many data-corruption and performance problems which are notoriously difficult to debug due to the lack of diagnosis information. In this paper, we present DScope, a tool that stat...
Server applications running inside production cloud infrastructures are prone to various performance problems (e.g., software hang, performance slowdown). When those problems occur, developers often have little clue to diagnose those problems. In this paper, we present Hytrace, a novel
hybrid
approach to diagnosing performance problems in producti...
Server applications running inside production cloud infrastructures are prone to various performance problems (e.g., software hang, performance slow down). When those problems occur, developers often have little clue to diagnose those problems. We present HyTrace, a novel hybrid approach to diagnosing performance problems in production cloud infras...
Security isolation is a foundation of computing systems that enables resilience to different forms of attacks. This article seeks to understand existing security isolation techniques by systematically classifying different approaches and analyzing their properties. We provide a hierarchical classification structure for grouping different security i...
Cloud computing has achieved great success in modern IT industry as an
excellent computing paradigm due to its flexible management and elastic
resource sharing. To date, cloud computing takes an irrepalceable position in
our socioeconomic system and influences almost every aspect of our daily life.
However, it is still in its infancy, many problems...
Big data processing is one of the killer applications for cloud systems. MapReduce systems such as Hadoop are the most popular big data processing platforms used in the cloud system. Data corruption is one of the most critical problems in cloud data processing, which not only has serious impact on the integrity of individual application results but...
Infrastructure-as-a-service clouds are becoming widely adopted. However, resource sharing and multi-tenancy have made performance anomalies a top concern for users. Timely debugging those anomalies is paramount for minimizing the performance penalty for users. Unfortunately, this debugging often takes a long time due to the inherent complexity and...
Questions
Questions (8)
Since confused deputy problem is a type of privilege escalation, and least privilege principle only gives program the necessary privilege, could the problem be solved?
If yes, how is least privilege used to solve it; if no, why least privilege cannot solve the problem?
Thanks for your answer in advance.
I am reading a file system related paper. It says a concept "directory indirection". Can anyone explain to me how it works? And also, what is the difference and what are the benefits compared with the classical file system structure?
While utilizing computing capacity, people pay few attention to Hadoop security. Are there any academic papers related to Hadoop security? I knew several commercial Hadoop products have its own security Model, but up to now, I have not found any paper talking about security problems published in a journal or conference.
I know Hadoop Cloudera and Hortonworks. Are there other Hadoop versions you can recommend? And, do we need to buy these products? Or are they open source?
Here is something I think I need to explain. Thanks for letting me know that Apache Hadoop is open source. But I am looking for Hadoop variations for research purpose.
I know that forward slicing is only for static program analysis while taint analysis can be used for both static and dynamic program analysis. Can anyone tell me the difference for static analysis? I can only see that both of them can explore all the variables influenced by the input.
In the paper "Apache Hadoop YARN: Yet Another Resource Negotiator", it claims that Yarn can extend node number from 4000 in Hadoop 1.x to over 7000. My question is that what is the key principle of improving scalability behind YARN.
In Hadoop 1.x, there are centralized NameNode and centralized JobTracker. It seems that things haven't changed much in YARN since it also has centralized ResourceManager. The bottleneck in Hadoop is still the design of centralized control.
Memory randomization is a way to protect memory from security attacks, and also a way to avoid false sharing on cache lines. But the common way to change the layout is by adding padding at the expense of losing benefits from cache locality. Do other ways to do memory randomization while still being able to utilize the cache localities exist?
Peer-to-peer systems often adopt decentralized organization, and each client serves as a server as well. Can we say all nodes play equal roles from a system that is decentralized? Or can we say a system is decentralized if all nodes work as a server?
Please explain the differences and give me some examples. It would be better if the example is a system in the real world.

















