Subversion Glossary


APR
Subversion is built on a portability layer called APR (Apache Portable Runtime library). This means Subversion should work on any operating system that the Apache httpd server runs on: Windows, Linux, all flavors of BSD, Mac OS X, Netware and others.

Branch
A branch refers to a copy of an existing original tree of directories and files. A branch always begins life as a copy of something, and moves on from there, generating its own history. Branches are usually created to try out new features without disturbing the main branch of development with compiler errors and bugs.

Checkout
Checking out a repository creates a copy of a desired branch on your local machine. This copy contains the latest revision of the repository that you specify.

Commit
A Commit of a file means the changes you have made on your local copy gets updated into the repository. Once a commit is made, users can see the latest version of that particular file after they do an "Update".

Conflict
Sometimes, when you update your files from the repository, you may get a conflict. A conflict occurs when two or more users have changed the same few lines of a file.

Hook
A hook is a program triggered by some repository event, such as the creation of a new revision or the modification of an unversioned property. A hook is handed enough information to tell what that event is, what targets it is operating on and the username of the person who triggered the event.

Lock
A Lock refers to a mechanism which a user asks for the exclusive right to modify change a working copy file.

Merging
Merging refers to the joining together of changes made on one branch into the trunk or vice versa another branch, which may also be the trunk.

Repository
The core of Subversion is a Repository. It is a centralized system to store and share data. The repository stores information in the form of a set of trees and branches, a hierarchy of directories and files. Any number of clients can connect to the repository and read or write to these files.

Repository Browser
In certain cases, one might need to work directly on the repository without having a working copy. This is where the repository browser comes in. It is identical to an explorer window with the icons, address bar to type in the repository URL name displayed. It also features commands like Copy, Move and Delete

Repository URL's
Repositories can be accessed through different methods on your local disk or through network protocols. A repository location always refers to a URL. These URL's use a standard syntax that refers to the server names and port numbers to be specified.

Revert
If you decide upon examination that you want to get back the changes made to a file, you can just skip back to the previous changes by using the Revert command.

Revision
Each time the repository accepts a commit, it creates a new version of the filesystem tree called a revision. Each revision is assigned a unique number, one greater than the number of the previous revision. The initial revision of a freshly created repository is numbered zero and consists of nothing but an empty root directory.

Revision Graph
A revision graph is a graphical representation of the location of the trunk, branches and tags going away from the trunk. It is very identical to a tree structure and is easy to view this sort of information.

Revision number
When you create a new repository, it begins its life at revision zero and each successive commit increases the revision number by one. After a commit is made, the Subversion client informs you of the new revision number. Each revision number has a tree hanging below it and each tree is a snapshot of the way the repository looks after each commit.

Switch
This subcommand updates your working copy to mirror a new URL; usually a URL which shares a common ancestor with your working copy. It is the Subversion way to move a working copy to a new branch.

Tagging
Tagging is basically placing a label on each file, no matter what revision number it has. This can be done either to a working copy or to the repository itself; the effects being the same.

Update
An update synchronizes the working copy with the last changes made by any user to the repository. It fetches the latest working copies of the files to your local drive. As a thumb rule, always update a file before making any changes to it.

Working copy
A working copy refers to the existing and updated copy of file(s) you have fetched from the repository. To obtain the working copy files, one needs to do a Checkout.

Irregular Verb


Base
Form
Simple
Past Tense
Past
Participle
awake awoke awoken
be was, were been
bear bore born
beat beat beat
become became become
begin began begun
bend bent bent
beset beset beset
bet bet bet
bid bid/bade bid/bidden
bind bound bound
bite bit bitten
bleed bled bled
blow blew blown
break broke broken
breed bred bred
bring brought brought
broadcast broadcast broadcast
build built built
burn burned/burnt burned/burnt
burst burst burst
buy bought bought
cast cast cast
catch caught caught
choose chose chosen
cling clung clung
come came come
cost cost cost
creep crept crept
cut cut cut
deal dealt dealt
dig dug dug
dive dived/dove dived
do did done
draw drew drawn
dream dreamed/dreamt dreamed/dreamt
drive drove driven
drink drank drunk
eat ate eaten
fall fell fallen
feed fed fed
feel felt felt
fight fought fought
find found found
fit fit fit
flee fled fled
fling flung flung
fly flew flown
forbid forbade forbidden
forget forgot forgotten
forego (forgo) forewent foregone
forgive forgave forgiven
forsake forsook forsaken
freeze froze frozen
get got gotten
give gave given
go went gone
grind ground ground
grow grew grown
hang hung hung
hear heard heard
hide hid hidden
hit hit hit
hold held held
hurt hurt hurt
keep kept kept
kneel knelt knelt
knit knit knit
know knew know
lay laid laid
lead led led
leap leaped/leapt leaped/leapt
learn learned/learnt learned/learnt
leave left left
lend lent lent
let let let
lie lay lain
light lighted/lit lighted
lose lost lost
make made made
mean meant meant
meet met met
misspell misspelled/misspelt misspelled/misspelt
mistake mistook mistaken
mow mowed mowed/mown
overcome overcame overcome
overdo overdid overdone
overtake overtook overtaken
overthrow overthrew overthrown
pay paid paid
plead pled pled
prove proved proved/proven
put put put
quit quit quit
read read read
rid rid rid
ride rode ridden
ring rang rung
rise rose risen
run ran run
saw sawed sawed/sawn
say said said
see saw seen
seek sought sought
sell sold sold
send sent sent
set set set
sew sewed sewed/sewn
shake shook shaken
shave shaved shaved/shaven
shear shore shorn
shed shed shed
shine shone shone
shoe shoed shoed/shod
shoot shot shot
show showed showed/shown
shrink shrank shrunk
shut shut shut
sing sang sung
sink sank sunk
sit sat sat
sleep slept slept
slay slew slain
slide slid slid
sling slung slung
slit slit slit
smite smote smitten
sow sowed sowed/sown
speak spoke spoken
speed sped sped
spend spent spent
spill spilled/spilt spilled/spilt
spin spun spun
spit spit/spat spit
split split split
spread spread spread
spring sprang/sprung sprung
stand stood stood
steal stole stolen
stick stuck stuck
sting stung stung
stink stank stunk
stride strod stridden
strike struck struck
string strung strung
strive strove striven
swear swore sworn
sweep swept swept
swell swelled swelled/swollen
swim swam swum
swing swung swung
take took taken
teach taught taught
tear tore torn
tell told told
think thought thought
thrive thrived/throve thrived
throw threw thrown
thrust thrust thrust
tread trod trodden
understand understood understood
uphold upheld upheld
upset upset upset
wake woke woken
wear wore worn
weave weaved/wove weaved/woven
wed wed wed
weep wept wept
wind wound wound
win won won
withhold withheld withheld
withstand withstood withstood
wring wrung wrung
write wrote written

A Survey and Taxonomy of Approaches for Mining Software Repositories in the Context of Software Evolution

By: H. Kagdi, M. L. Collard, and J. I. Maletic
Identified 4 dimensions in order to objectively describe and compare the different approaches
  1. The software repositories utilized: what information sources are used?
  2. The purpose of MSR: why mine or what to mine for?
  3. The methodology: how to achieve the purpose of mining from the selected software repositories?
  4. The evaluation of the undertaken approach: how to assess quality?
What types of sources can be considered as software repositories?
  • source-control systems are used for storing and manageing changes to source code artifacts, typically files, under evolution.
  • defect-tracking systems are used to manage the reporting and resolution of defects/bugs/faults and/or feature enhancements.
  • archived communications: making them sources for information including change rationales.
repositories have a common goal of supporting software evolution by managing the lifecycle of a software change.

In light of the primacy of source code change => 3 basic categories of information in a software repository that can be mined:
  1. the software artifacts/versions
  2. the differences between the artifacts/versions
  3. the metadata about the software change
I interest toppic below

4.9. Classification with supervised learning

4.9.1. Maintenance relevance relations

A classification-learning technique is used by Shirabad et al. [37–39] to determine the co-update relations between a pair of source code files, i.e., given two files determine whether a change in one leads to a change in the other. Such types of relations are also termed maintenance-relevance relations. A decision-tree classifier (i.e., model) is produced by a machine-learning (induction) algorithm. A time-based heuristic is employed to assign a relevant or non-relevant relation between a pair of files to form the learning and testing sets. A fixed time period between time T1 and T2 (T2
rate versus true-positive rate), precision, and recall plots imply that the PR attributes generate better classifiers than those of syntactic attributes. The comment attributes generated classifiers do not perform on a par with those generated with the PR attributes. However, they are better than those generated from the syntactic attributes. The classifiers generated from a combination of syntactic and comment attributes produce better results than either of them considered alone.

4.9.2. Triage bug reports

Anvik et al. [63] used a supervised learning (i.e., support vector machine algorithm) in order to recommend a list of potential developers for resolving a BR. Past reports in the Bugzilla repository are used to produce a classifier. The authors developed project-specific heuristics to train the classifier instead of directly using the assigned-to field of a BR. This was done to avoid incorrect assignment of BRs with default assignments that may not necessarily reflect the actual developer who resolved a bug. The approach is evaluated on three open-source projects Eclipse, Firefox, and GCC. Developers that contributed at least nine BR resolutions over the most recent three months were considered in the training set for Eclipse and Firefox. The precision for Eclipse and Firefox was 57% and 64%, respectively, and the recall 7% and 2%, respectively. The precision of GCC was 6% for recommending one developer and 18% for two/three developers. The recall of GCC was 0.3%, 2%, and 3% for recommending one, two, and three developers, respectively.

5. DISCUSSION AND OPEN ISSUES

5.1. MSR on fine-grained entities

One major issue is the disparity between the software-evolution data available in the repositories
and the needs of the stockholders, not just researchers but also including software maintainers.
The majority of current MSR approaches operate at either the physical level (e.g., system,
subsystems, directories, files, lines) or at a fairly high level of logical/syntactic entities (e.g., classes). This is regardless of the primary focus, i.e., changes of properties or artifacts. In part this is due to the researchers restricting their approaches/studies to what is directly available and supported by the software repositories (e.g., file and line view of source code and their differences). However, the investigations by Zimmermann et al. [33] have shown the benefits of further processing the information directly available from source code repositories for change prediction and impact-analysis tasks. In their study [33], there was no significant difference in precision and recall values between filebased and logical-based entities (i.e., classes, methods, and variables) with respect to change-prediction tasks. However, there is an implicit gain in terms of the context available to themaintainer, for example, the exact location of a predicted change. Predicting a change at an entity level rather than a file level reduces the manual effort as only the predicted entities (versus the whole file) need to be examined. This leads to the issue of extending current MSR by increasing the source code awareness. The issue of source code awareness could be twofold with regards to the types of MSR questions and the source code artifacts and differences. For example, on one end, a market-basket question is used to find logical/evolutionary couplings between source code entities. These couplings are termed ‘hidden’ dependencies as they are solely based on the historical information of software changes. However, very little attention has been paid as to whether these hidden dependencies correspond to relationships present in well-established source code models (e.g., control-flow graphs, dependency graphs, call graphs, and UML models). We feel that a finer-grained understanding of the source code changes is needed to address these types of questions. Fluri et al. [106] analyzed change-sets from a CVS repository to distinguish between changes within source code entities such as classes and methods (termed as structural changes) from the changes to license updates and white space between source code entities (termed as non-structural changes). The goal of their work was to refine evolutionary couplings detected from the version history with this information (i.e., reduce false positives). Their study on an Eclipse plugin found over 31% of change-sets with no structural changes and over 51% of change-sets with at least one non-structural change. In one of the rare cases, Ying et al. [34] defined the interestingness measure of the evolutionary coupling based on the source code dependencies such as calls, inheritance, and usage. Their study on Eclipse and Mozilla found evolutionary couplings that were not represented by the source code dependencies they considered. We feel that further utilizing such source code dependencies (such as association and dependency relationships defined in UML) will result in developing heuristics and criteria that would further reduce false evolutionary couplings. It will also help to detect evolutionary couplings that are prevalent but do not exhibit any source code dependencies (e.g., domain or developer induced dependencies). More studies in this direction are needed to realize the exclusive and synergistic contributions of MSR approaches.

5.2. Historical context: how many versions?

Software repositories bring a rich history of software development and evolution. One goal of MSR is to undercover the past successes, and failures, from historical information and improve the evolution process of the software system(s) under consideration.However, one needs to be carefulwhen selecting the amount and period of historical data for basing tools or models supporting a particular aspect of software evolution. Considering the development data too far back in the history imposes a risk of irrelevant information. The design or operational assumptions of the system may no longer be similar, or worse may be entirely different. For example, consider a hypothetical system that has undergone 1000 versions. The information about the changes in the first 50 versions may be totally irrelevant for predicting the changes in version 1001. A series of changes from version 50 to version 200 could be attributed to an unstable unit in the system that has now stabilized. On the other hand, considering too few versions of the system imposes the risk of being incomplete or missing important relevant information thus resulting in few useful results. For example, a current version of a system may be in the middle of a refactoring that is achieved by a sequence of changes (versions). The minimum requirement would be the past versions beginning from when the refactoring started to first confirm the kind of refactoring taking place and predict the remaining steps. The number of versions to mine depends on the task and the current state/phase of the system under consideration.

5.3. Threats to validity in MSR

MSR approaches use a variety of software repositories, ask different questions, and draw conclusions within the context of the conducted study. All of these factors are subject to threats to validity. Gasser et al. [16] identified the challenges associated with the common need among researchers in selecting, gathering, and maintaining the raw data of open-source projects for their respective investigations. They suggested a research infrastructure to deal with such challenges and to serve as a benchmark to facilitate comparative and collaborative research. They discussed the infrastructure with regards to representation standards for data and metadata available in various software repositories, linking them, the required tools, and a centralized data repository. German et al. further suggested a set of projects representing various sizes and domains, their extracted source code facts (i.e., syntax and semantic), and the period of considered history and observation for these projects to be benchmarked [10,18]. We call for a comparative framework to objectively compare MSR approaches with regards to the aspects of software evolution, MSR questions, and the results. Such a framework will facilitate more generic conclusions in the MSR research. Currently, it is difficult to see that two independent MSR investigations are asking equivalent questions or studying the same or similar aspect of software evolution. A benchmark of this nature would help address the expressiveness and effectiveness of MSR in improving software evolution.

J. Neville, F. Provost, "Predictive Modeling with Social Networks"

Jennifer Neville, Purdue University
Foster Provost, New York University

Abstract
Recently there has been a surge of interest in methods for analyzing complex social networks: from communication networks, to friendship networks, to professional and organizational networks. The dependencies among linked entities in the networks present an opportunity to improve inference about properties of individuals, as birds of a feather do indeed flock together. For example, when deciding how to market a product to people in MySpace or Facebook, it may be helpful to consider whether a person's friends are likely to purchase the product.

This tutorial will explore the unique opportunities and challenges for modeling social network data. We will begin with a description of the problem setting, including examples of various applications of social network mining (e.g., marketing, fraud detection). We will then present a number of characteristics of social network data that differentiate it from traditional inference and learning settings, and outline the resulting opportunities for significantly improved inference and learning. We will discuss specific techniques for capitalizing on each of the opportunities in statistical models, and outline both methodological issues and potential modeling pathologies that are unique to network data. We will give links to the recent literature to guide study, and present results demonstrating the effectiveness of the techniques.

Prerequisites: The tutorial assumes a basic knowledge of AI-style inference and machine learning, equivalent to an introductory graduate or advanced undergraduate class.


Ref: http://www.sigkdd.org/kdd2008/tutorials.html

Mining Recurrent Activities: Fourier Analysis of Change Events

By: Abram Hindle, Michael W.Godfrey, and Richard C.Holt

periods ที่ใช้ใน Time-series ส่วนใหญ่จะใช้เป็น วัน เดือน หรือ ปี As yet there has been no research to validate that these assumptions are reasonable. จึงใช้ Fourier analysis เพื่อหา "natural" periodicities of software development. Fourier can detect and detemine the periodicity of repeating events.

ICSE 2009

vocabularies

mitigate ทำให้ลดน้อยลง เบาบางลง
duality การอยู่เป็นคู่
coefficient ค่าสัมประสิทธิ์
vague คลุมเครือ
ขอบคุณ Lexitron

Definition of terms

approach วิธีการไปให้ถึงจุดหมาย เป็นแนวทางกว้างๆ

method วิธีการ/ ขั้นตอนการทำงาน

Linear Predictive Coding and Cepstrum coefficients for mining time variant information from software repositories

By: Giuliano Antoniol, Vincenzo Fabio Rollo, and Gabriele Venturi

Abstract เสนอแนวทางการ recover time variant information from software repositories
ขณะนี้ยังขาดแคลนเทคนิค วิธีการในการสกัดข้อมูลที่ขึ้นอยู่กับเวลา แต่ในงานด้าน signal, image processing, and speech recognition มีวิธีการนำเสนอความแตกต่างของข้อมูล ณ ช่วงเวลาหนึ่งๆ
งานวิจัยนี้ใช้ LPC and Cepstrum coefficients เพื่อสร้างแบบจำลอง time varying software artifact histories. highlight component and artifacts evolved ที่เป็นไปในทางเดียวกันหรือคล้ายกัน

MSR 2005

Coming soon

Coming soon
V
V
V
"A Survey and Taxonomy of Approaches for Mining Software
Repositories in the Context of Software Evolution"