~ PoHuA ~: August 2009

Subversion Glossary

From: http://www.openoffice.org/scdocs/ddSVN_svnglossary.html.en

APR

Subversion is built on a portability layer called APR (Apache Portable Runtime library). This means Subversion should work on any operating system that the Apache httpd server runs on: Windows, Linux, all flavors of BSD, Mac OS X, Netware and others.

Branch

A branch refers to a copy of an existing original tree of directories and files. A branch always begins life as a copy of something, and moves on from there, generating its own history. Branches are usually created to try out new features without disturbing the main branch of development with compiler errors and bugs.

Checkout

Checking out a repository creates a copy of a desired branch on your local machine. This copy contains the latest revision of the repository that you specify.

Commit

A Commit of a file means the changes you have made on your local copy gets updated into the repository. Once a commit is made, users can see the latest version of that particular file after they do an "Update".

Conflict

Sometimes, when you update your files from the repository, you may get a conflict. A conflict occurs when two or more users have changed the same few lines of a file.

Hook

A hook is a program triggered by some repository event, such as the creation of a new revision or the modification of an unversioned property. A hook is handed enough information to tell what that event is, what targets it is operating on and the username of the person who triggered the event.

Lock

A Lock refers to a mechanism which a user asks for the exclusive right to modify change a working copy file.

Merging

Merging refers to the joining together of changes made on one branch into the trunk or vice versa another branch, which may also be the trunk.

Repository

The core of Subversion is a Repository. It is a centralized system to store and share data. The repository stores information in the form of a set of trees and branches, a hierarchy of directories and files. Any number of clients can connect to the repository and read or write to these files.

Repository Browser

In certain cases, one might need to work directly on the repository without having a working copy. This is where the repository browser comes in. It is identical to an explorer window with the icons, address bar to type in the repository URL name displayed. It also features commands like Copy, Move and Delete

Repository URL's

Repositories can be accessed through different methods on your local disk or through network protocols. A repository location always refers to a URL. These URL's use a standard syntax that refers to the server names and port numbers to be specified.

Revert

If you decide upon examination that you want to get back the changes made to a file, you can just skip back to the previous changes by using the Revert command.

Revision

Each time the repository accepts a commit, it creates a new version of the filesystem tree called a revision. Each revision is assigned a unique number, one greater than the number of the previous revision. The initial revision of a freshly created repository is numbered zero and consists of nothing but an empty root directory.

Revision Graph

A revision graph is a graphical representation of the location of the trunk, branches and tags going away from the trunk. It is very identical to a tree structure and is easy to view this sort of information.

Revision number

When you create a new repository, it begins its life at revision zero and each successive commit increases the revision number by one. After a commit is made, the Subversion client informs you of the new revision number. Each revision number has a tree hanging below it and each tree is a snapshot of the way the repository looks after each commit.

Switch

This subcommand updates your working copy to mirror a new URL; usually a URL which shares a common ancestor with your working copy. It is the Subversion way to move a working copy to a new branch.

Tagging

Tagging is basically placing a label on each file, no matter what revision number it has. This can be done either to a working copy or to the repository itself; the effects being the same.

Update

An update synchronizes the working copy with the last changes made by any user to the repository. It fetches the latest working copies of the files to your local drive. As a thumb rule, always update a file before making any changes to it.

Working copy

A working copy refers to the existing and updated copy of file(s) you have fetched from the repository. To obtain the working copy files, one needs to do a Checkout.

Base Form	Simple Past Tense	Past Participle
awake	awoke	awoken
be	was, were	been
bear	bore	born
beat	beat	beat
become	became	become
begin	began	begun
bend	bent	bent
beset	beset	beset
bet	bet	bet
bid	bid/bade	bid/bidden
bind	bound	bound
bite	bit	bitten
bleed	bled	bled
blow	blew	blown
break	broke	broken
breed	bred	bred
bring	brought	brought
broadcast	broadcast	broadcast
build	built	built
burn	burned/burnt	burned/burnt
burst	burst	burst
buy	bought	bought
cast	cast	cast
catch	caught	caught
choose	chose	chosen
cling	clung	clung
come	came	come
cost	cost	cost
creep	crept	crept
cut	cut	cut
deal	dealt	dealt
dig	dug	dug
dive	dived/dove	dived
do	did	done
draw	drew	drawn
dream	dreamed/dreamt	dreamed/dreamt
drive	drove	driven
drink	drank	drunk
eat	ate	eaten
fall	fell	fallen
feed	fed	fed
feel	felt	felt
fight	fought	fought
find	found	found
fit	fit	fit
flee	fled	fled
fling	flung	flung
fly	flew	flown
forbid	forbade	forbidden
forget	forgot	forgotten
forego (forgo)	forewent	foregone
forgive	forgave	forgiven
forsake	forsook	forsaken
freeze	froze	frozen
get	got	gotten
give	gave	given
go	went	gone
grind	ground	ground
grow	grew	grown
hang	hung	hung
hear	heard	heard
hide	hid	hidden
hit	hit	hit
hold	held	held
hurt	hurt	hurt
keep	kept	kept
kneel	knelt	knelt
knit	knit	knit
know	knew	know
lay	laid	laid
lead	led	led
leap	leaped/leapt	leaped/leapt
learn	learned/learnt	learned/learnt
leave	left	left
lend	lent	lent
let	let	let
lie	lay	lain
light	lighted/lit	lighted
lose	lost	lost
make	made	made
mean	meant	meant
meet	met	met
misspell	misspelled/misspelt	misspelled/misspelt
mistake	mistook	mistaken
mow	mowed	mowed/mown
overcome	overcame	overcome
overdo	overdid	overdone
overtake	overtook	overtaken
overthrow	overthrew	overthrown
pay	paid	paid
plead	pled	pled
prove	proved	proved/proven
put	put	put
quit	quit	quit
read	read	read
rid	rid	rid
ride	rode	ridden
ring	rang	rung
rise	rose	risen
run	ran	run
saw	sawed	sawed/sawn
say	said	said
see	saw	seen
seek	sought	sought
sell	sold	sold
send	sent	sent
set	set	set
sew	sewed	sewed/sewn
shake	shook	shaken
shave	shaved	shaved/shaven
shear	shore	shorn
shed	shed	shed
shine	shone	shone
shoe	shoed	shoed/shod
shoot	shot	shot
show	showed	showed/shown
shrink	shrank	shrunk
shut	shut	shut
sing	sang	sung
sink	sank	sunk
sit	sat	sat
sleep	slept	slept
slay	slew	slain
slide	slid	slid
sling	slung	slung
slit	slit	slit
smite	smote	smitten
sow	sowed	sowed/sown
speak	spoke	spoken
speed	sped	sped
spend	spent	spent
spill	spilled/spilt	spilled/spilt
spin	spun	spun
spit	spit/spat	spit
split	split	split
spread	spread	spread
spring	sprang/sprung	sprung
stand	stood	stood
steal	stole	stolen
stick	stuck	stuck
sting	stung	stung
stink	stank	stunk
stride	strod	stridden
strike	struck	struck
string	strung	strung
strive	strove	striven
swear	swore	sworn
sweep	swept	swept
swell	swelled	swelled/swollen
swim	swam	swum
swing	swung	swung
take	took	taken
teach	taught	taught
tear	tore	torn
tell	told	told
think	thought	thought
thrive	thrived/throve	thrived
throw	threw	thrown
thrust	thrust	thrust
tread	trod	trodden
understand	understood	understood
uphold	upheld	upheld
upset	upset	upset
wake	woke	woken
wear	wore	worn
weave	weaved/wove	weaved/woven
wed	wed	wed
weep	wept	wept
wind	wound	wound
win	won	won
withhold	withheld	withheld
withstand	withstood	withstood
wring	wrung	wrung
write	wrote	written

A Survey and Taxonomy of Approaches for Mining Software Repositories in the Context of Software Evolution

By: H. Kagdi, M. L. Collard, and J. I. Maletic

Identified 4 dimensions in order to objectively describe and compare the different approaches

The software repositories utilized: what information sources are used?
The purpose of MSR: why mine or what to mine for?
The methodology: how to achieve the purpose of mining from the selected software repositories?
The evaluation of the undertaken approach: how to assess quality?

What types of sources can be considered as software repositories?

source-control systems are used for storing and manageing changes to source code artifacts, typically files, under evolution.
defect-tracking systems are used to manage the reporting and resolution of defects/bugs/faults and/or feature enhancements.
archived communications: making them sources for information including change rationales.

repositories have a common goal of supporting software evolution by managing the lifecycle of a software change.

In light of the primacy of source code change => 3 basic categories of information in a software repository that can be mined:

the software artifacts/versions
the differences between the artifacts/versions
the metadata about the software change

I interest toppic below

4.9. Classification with supervised learning

4.9.1. Maintenance relevance relations

A classification-learning technique is used by Shirabad et al. [37–39] to determine the co-update relations between a pair of source code files, i.e., given two files determine whether a change in one leads to a change in the other. Such types of relations are also termed maintenance-relevance relations. A decision-tree classifier (i.e., model) is produced by a machine-learning (induction) algorithm. A time-based heuristic is employed to assign a relevant or non-relevant relation between a pair of files to form the learning and testing sets. A fixed time period between time T1 and T2 (T2

rate versus true-positive rate), precision, and recall plots imply that the PR attributes generate better classifiers than those of syntactic attributes. The comment attributes generated classifiers do not perform on a par with those generated with the PR attributes. However, they are better than those generated from the syntactic attributes. The classifiers generated from a combination of syntactic and comment attributes produce better results than either of them considered alone.

4.9.2. Triage bug reports

Anvik et al. [63] used a supervised learning (i.e., support vector machine algorithm) in order to recommend a list of potential developers for resolving a BR. Past reports in the Bugzilla repository are used to produce a classifier. The authors developed project-specific heuristics to train the classifier instead of directly using the assigned-to field of a BR. This was done to avoid incorrect assignment of BRs with default assignments that may not necessarily reflect the actual developer who resolved a bug. The approach is evaluated on three open-source projects Eclipse, Firefox, and GCC. Developers that contributed at least nine BR resolutions over the most recent three months were considered in the training set for Eclipse and Firefox. The precision for Eclipse and Firefox was 57% and 64%, respectively, and the recall 7% and 2%, respectively. The precision of GCC was 6% for recommending one developer and 18% for two/three developers. The recall of GCC was 0.3%, 2%, and 3% for recommending one, two, and three developers, respectively.

5. DISCUSSION AND OPEN ISSUES

5.1. MSR on fine-grained entities

One major issue is the disparity between the software-evolution data available in the repositories

and the needs of the stockholders, not just researchers but also including software maintainers.

The majority of current MSR approaches operate at either the physical level (e.g., system,

subsystems, directories, files, lines) or at a fairly high level of logical/syntactic entities (e.g., classes). This is regardless of the primary focus, i.e., changes of properties or artifacts. In part this is due to the researchers restricting their approaches/studies to what is directly available and supported by the software repositories (e.g., file and line view of source code and their differences). However, the investigations by Zimmermann et al. [33] have shown the benefits of further processing the information directly available from source code repositories for change prediction and impact-analysis tasks. In their study [33], there was no significant difference in precision and recall values between filebased and logical-based entities (i.e., classes, methods, and variables) with respect to change-prediction tasks. However, there is an implicit gain in terms of the context available to themaintainer, for example, the exact location of a predicted change. Predicting a change at an entity level rather than a file level reduces the manual effort as only the predicted entities (versus the whole file) need to be examined. This leads to the issue of extending current MSR by increasing the source code awareness. The issue of source code awareness could be twofold with regards to the types of MSR questions and the source code artifacts and differences. For example, on one end, a market-basket question is used to find logical/evolutionary couplings between source code entities. These couplings are termed ‘hidden’ dependencies as they are solely based on the historical information of software changes. However, very little attention has been paid as to whether these hidden dependencies correspond to relationships present in well-established source code models (e.g., control-flow graphs, dependency graphs, call graphs, and UML models). We feel that a finer-grained understanding of the source code changes is needed to address these types of questions. Fluri et al. [106] analyzed change-sets from a CVS repository to distinguish between changes within source code entities such as classes and methods (termed as structural changes) from the changes to license updates and white space between source code entities (termed as non-structural changes). The goal of their work was to refine evolutionary couplings detected from the version history with this information (i.e., reduce false positives). Their study on an Eclipse plugin found over 31% of change-sets with no structural changes and over 51% of change-sets with at least one non-structural change. In one of the rare cases, Ying et al. [34] defined the interestingness measure of the evolutionary coupling based on the source code dependencies such as calls, inheritance, and usage. Their study on Eclipse and Mozilla found evolutionary couplings that were not represented by the source code dependencies they considered. We feel that further utilizing such source code dependencies (such as association and dependency relationships defined in UML) will result in developing heuristics and criteria that would further reduce false evolutionary couplings. It will also help to detect evolutionary couplings that are prevalent but do not exhibit any source code dependencies (e.g., domain or developer induced dependencies). More studies in this direction are needed to realize the exclusive and synergistic contributions of MSR approaches.

5.2. Historical context: how many versions?

Software repositories bring a rich history of software development and evolution. One goal of MSR is to undercover the past successes, and failures, from historical information and improve the evolution process of the software system(s) under consideration.However, one needs to be carefulwhen selecting the amount and period of historical data for basing tools or models supporting a particular aspect of software evolution. Considering the development data too far back in the history imposes a risk of irrelevant information. The design or operational assumptions of the system may no longer be similar, or worse may be entirely different. For example, consider a hypothetical system that has undergone 1000 versions. The information about the changes in the first 50 versions may be totally irrelevant for predicting the changes in version 1001. A series of changes from version 50 to version 200 could be attributed to an unstable unit in the system that has now stabilized. On the other hand, considering too few versions of the system imposes the risk of being incomplete or missing important relevant information thus resulting in few useful results. For example, a current version of a system may be in the middle of a refactoring that is achieved by a sequence of changes (versions). The minimum requirement would be the past versions beginning from when the refactoring started to first confirm the kind of refactoring taking place and predict the remaining steps. The number of versions to mine depends on the task and the current state/phase of the system under consideration.

5.3. Threats to validity in MSR

MSR approaches use a variety of software repositories, ask different questions, and draw conclusions within the context of the conducted study. All of these factors are subject to threats to validity. Gasser et al. [16] identified the challenges associated with the common need among researchers in selecting, gathering, and maintaining the raw data of open-source projects for their respective investigations. They suggested a research infrastructure to deal with such challenges and to serve as a benchmark to facilitate comparative and collaborative research. They discussed the infrastructure with regards to representation standards for data and metadata available in various software repositories, linking them, the required tools, and a centralized data repository. German et al. further suggested a set of projects representing various sizes and domains, their extracted source code facts (i.e., syntax and semantic), and the period of considered history and observation for these projects to be benchmarked [10,18]. We call for a comparative framework to objectively compare MSR approaches with regards to the aspects of software evolution, MSR questions, and the results. Such a framework will facilitate more generic conclusions in the MSR research. Currently, it is difficult to see that two independent MSR investigations are asking equivalent questions or studying the same or similar aspect of software evolution. A benchmark of this nature would help address the expressiveness and effectiveness of MSR in improving software evolution.

J. Neville, F. Provost, "Predictive Modeling with Social Networks"

Jennifer Neville, Purdue University
Foster Provost, New York University

Abstract
Recently there has been a surge of interest in methods for analyzing complex social networks: from communication networks, to friendship networks, to professional and organizational networks. The dependencies among linked entities in the networks present an opportunity to improve inference about properties of individuals, as birds of a feather do indeed flock together. For example, when deciding how to market a product to people in MySpace or Facebook, it may be helpful to consider whether a person's friends are likely to purchase the product.

This tutorial will explore the unique opportunities and challenges for modeling social network data. We will begin with a description of the problem setting, including examples of various applications of social network mining (e.g., marketing, fraud detection). We will then present a number of characteristics of social network data that differentiate it from traditional inference and learning settings, and outline the resulting opportunities for significantly improved inference and learning. We will discuss specific techniques for capitalizing on each of the opportunities in statistical models, and outline both methodological issues and potential modeling pathologies that are unique to network data. We will give links to the recent literature to guide study, and present results demonstrating the effectiveness of the techniques.

Prerequisites: The tutorial assumes a basic knowledge of AI-style inference and machine learning, equivalent to an introductory graduate or advanced undergraduate class.

Ref: http://www.sigkdd.org/kdd2008/tutorials.html

Mining Recurrent Activities: Fourier Analysis of Change Events

By: Abram Hindle, Michael W.Godfrey, and Richard C.Holt

periods ที่ใช้ใน Time-series ส่วนใหญ่จะใช้เป็น วัน เดือน หรือ ปี As yet there has been no research to validate that these assumptions are reasonable. จึงใช้ Fourier analysis เพื่อหา "natural" periodicities of software development. Fourier can detect and detemine the periodicity of repeating events.

ICSE 2009

vocabularies

mitigate ทำให้ลดน้อยลง เบาบางลง

duality การอยู่เป็นคู่

coefficient ค่าสัมประสิทธิ์

vague คลุมเครือ

ขอบคุณ Lexitron

Definition of terms

approach วิธีการไปให้ถึงจุดหมาย เป็นแนวทางกว้างๆ

method วิธีการ/ ขั้นตอนการทำงาน

Linear Predictive Coding and Cepstrum coefficients for mining time variant information from software repositories

By: Giuliano Antoniol, Vincenzo Fabio Rollo, and Gabriele Venturi

Abstract เสนอแนวทางการ recover time variant information from software repositories

ขณะนี้ยังขาดแคลนเทคนิค วิธีการในการสกัดข้อมูลที่ขึ้นอยู่กับเวลา แต่ในงานด้าน signal, image processing, and speech recognition มีวิธีการนำเสนอความแตกต่างของข้อมูล ณ ช่วงเวลาหนึ่งๆ

งานวิจัยนี้ใช้ LPC and Cepstrum coefficients เพื่อสร้างแบบจำลอง time varying software artifact histories. highlight component and artifacts evolved ที่เป็นไปในทางเดียวกันหรือคล้ายกัน

MSR 2005

Coming soon

"A Survey and Taxonomy of Approaches for Mining Software

Repositories in the Context of Software Evolution"

สิ่งที่ต้องตรวจสอบก่อนส่งไฟล์

๑. การตั้งชื่อไฟล์

๑.๑ ไฟล์ "การสื่อสารข้อมูล" ให้ตั้งชื่อดังนี้

2ห้อง-ชื่อนักเรียน-เลขที่ ตัวอย่างเช่น

นักเรียนอยู่ห้อง ม.2/0 ชื่อพี่อู๊ด เลขที่ 3 ต้องตั้งชื่อไฟล์ว่า

"20-พี่อู๊ด-3"

๑.๒ ไฟล์ที่จัดให้อยู่ในรูปเอกสารเล่มเล็กแล้วให้ตั้งชื่อดังนี้

2ห้องlittle-ชื่อนักเรียน-เลขที่ ตัวอย่างเช่น

"20little-พี่อู๊ด-3"

๒. เมล์ที่ส่งอย่างลืมเติมจุด (.) หลังเลข 4 ด้วย

๓. หากไฟล์มีการใส่รหัสผ่าน (password) ให้ส่งรหัสผ่านมาด้วย

๔. ส่งเมล์ก่อนเที่ยงคืนวันอาทิตย์ที่ ๒๐ กุมภาพันธ์ ๒๕๕๔

๕. หลังการส่งให้รอเมล์ตอบกลับ ซึ่งจะเป็นเมล์ที่มีการเตือนให้ตรวจสอบการส่งเมล์ หากนักเรียนทำตามข้อ ๑-๓ แล้ว ไม่ต้องส่งเมล์ซ้ำ

Jennifer Neville, Purdue UniversityFoster Provost, New York University

Jennifer Neville, Purdue University
Foster Provost, New York University